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T, Peter Filipcik, declare that: 

1. I am a co-inventor of the above-referenced patent application. I am also an employee of 
Axon Neurosclence, the assignee of the above-referenced application. A copy of my 
Curriculum Vitae is attached as Exhibit 1 . 

2. I am a co-author of the publication Zilka et al„ 'Truncated tau from sporadic Alzheimer's 
disease suffices to drive neurofibrillary degeneration in vivo," FEBS Letters 580:3582- 
3588 (2006) (hereinafter, the "Zilka reference")* 

3. It is my understanding that the Examiner in charge of the above-captioned application has 
advanced an enablement rejection against claims 17-33. I am supplying this declaration 
to provide additional evidence of the enablement of the pr«* Bnt claims. In particular, this 
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declaration provides additional data on transgenic rat line #318, which is the same 
transgenic rat line #318 described in the present patent application, demonstrating that a 
transgenic animal having a DNA construct coding for N- and C-terminaUy truncated tan 
molecules according to the present Invention exhibits phenotypes that make it a suitable 
model for Alzheimer's disease. 

4. Attached as Exhibit 2 is Zilka et al., "Truncated tau from sporadic Alzheimer's disease 
suffices to drive neurofibrillary degeneration in vivo," FEBS Letters 580:3582-3588 
(2006). The Zilka reference describes the generation of and studies on transgenic rat line 
#318. Transgenic rat line #318 is the same transgenic rat line #318 described in the 
specification of the present patent application. See, e.g., Specification, p. 22, first full 
paragraph; p. 23, first full paragraph; and Fig. 3C 

5. According to the teachings in the present specification, the DNA constructs used for 
transgenic animal preparation in the Zilka reference are characterized by the following 
features; (1) the cDNA molecules are truncated at least 30 nucleotides downstream of the 
start codon and truncated at least the 30 nucleotides upstream of the stop codon of the 
full-length tau cDNA sequence coding for 4-repeat and 3-repeat tau protein; (2) the 
cDNA molecule comprises SEQ ID No. 9; (3) and the DNA construct encodes a protein, 
which has neurofibrillary (NF) pathology producing activity when expressed in brain 
cells. 

6. The transgene construct used in the generation of transgenic rat lines #318 and #72 was 
prepared by ligation of a cDNA coding For human tau protein truncated at amino acid 
positions 15 1-391 into the mouse Thy-1 gene downstream of the brain promoter/enhancer 
sequence. Zilka, p. 3582, col. 2. Transgenic rat line #318 is the same rat line described 
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in the present patent application {see e,g., Example 2). It should be noted that the 
numbering of the amino acids of the tau protein in the Zilka reference is based on tau 
isoform 40, whereas the numbering in the present patent application is based on tau 
isoform 43. Tau isoform 40 contains an extra insert of 58 amino acids (174 nucleotides) 
in the N-terminus of the protein. Thus, the truncated tau protein numbered amino acids 
151-391 in the Zilka reference is the same as a truncated tau protein numbered amino 
acids 93-333 based on the numbering in the patent application. Using the numbering in 
the patent application, amino acids 93-333 correspond to nucleotides 279-999- Thus, the 
truncated tau cDNA molecule used to generate rat line #318 is truncated at least 30 
nucleotides downstream of the start codon and truncated at least the 30 nucleotides 
upstream of the stop codon of the full-length tau cDNA sequence coding for 4-repeat and 
3-repeat tau protein; and the truncated tau cDNA molecule comprises SEQ ID NO: 9 
(nucleotides 741-930). 

7, The transgenic DNA was linearized by cleavage with EcoRl, and the vector sequences 
were removed prior to microinjection. Zilka, p. 3582, col. 2. Transgenic rats were 
generated by pronuclear injection of one-day old SHR rat embryos. Id Founders were 
screened by PCR using Thy- 1 -specific and human tau-specific primers. Id Two 
independent transgenic founder lines, #318 and #72, that stably expressed human 
truncated tau were obtained. Zilka, p. 3582, col. 2 to p, 3583, col. 1. 

8. The Zilka reference also describes the generation of transgenic rat line #72, which was 
created using the same transgene construct and the same SHR background as used in the 
generation of transgenic rat line #3 18. See e.g., Zilka, paragraph spanning pp 3582-3583. 
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9. As described in the present specification transgenic rat line #318 exhibits neurofibrillary 
(NF) pathology producing activity when expressed. For example, Fig. 6 shows the 
detection of intracellular inclusions and neurofibrillary filaments using silver staining in 
the neurons of the central nervous system of transgenic rats, whereas wild-type rats did 
not show these structures in the homologous brain area. Figs. 7 and 8 show the detection 
of neurofibrillary tangles in the central nervous system of transgenic rats using the pan- 
tau monoclonal antibody DC 25 and the monoclonal antibody PHF-l, respectively. 
Additionally, Fig. 10 shows a comparison of neurofibrillary tangles detected by Gallyas 
silver technique (Fig. 10A and Fig. 10C) and also by immunohistochemistry (Fig. 10E) in 
AD diseased human brain, to the equivalent pathological structures observed in the 
transgenic rat of present invention. The observation of neurofibrillary pathology in 
transgenic rat line #318 described in the present specification was confirmed by the 
studies described in the Zilka reference in which transgenic rat lines #318 was shown to 
exhibit neurofibrillary pathology. Zilka, p. 3582-3583 and Fig. 3. Thus, the studies 
presented in the present patent application and in the Zilka reference demonstrate that the 
DNA construct used to make transgenic rat line #318 encodes a protein, which has 
neurofibrillary (NF) pathology producing activity when expressed in brain cells. 

10. The evidence discussed above demonstrates that transgenic rat line #3 1 8 contains a DNA 
construct having a cDNA molecule coding for N- and C-terminally truncated tau 
molecules having the following features: (1) the cDNA molecule is truncated at least 30 
nucleotides downstream of the start codon and truncated at least the 30 nucleotides 
upstream of the stop codon of the full-length tau cDNA sequence coding for 4-repeat and 
3-repeat tau protein; (2) the cDNA molecule comprises SEQ ID No. 9; and (3) the DNA 
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construct encode? a protein, which has neurofibrillary (NF) pathology producing activity 
when expressed in brain cells, 
U. In additional studies with transgenic rat line #318 performed at Axon Neuroscience, 
phenotypes including cognitive impairment, oxidative stress, metabolic (energy) stress, 
and phosphorylation have been observed. For example, a statistically significant 
cognitive deficit was measured in transgenic rat line #318 as compared to non-transgenic 
litter mates in a water maze test. Exhibit 3, Fig, 1 . As shown in Fig. 2 of Exhibit 3, 
transgenic rat line #318 showed increased oxidative stress as a consequence of the 
pathological cascade initiated by transgene expression. As shown in Fig, 3 of Exhibit 3, 
the kinetic measurement of the creatine kinase reaction showed that the constant rate 
values of the brain specific creatine kinase was significantly reduced in transgenic rat line 
#3 IS indicating energy stress. In addition, western blot analysis showed strong AD-like 
phosphorylation pattern of tau protein in transgenic rat line #318, Exhibit 3, Fig, 5. 

12, The observed phenotypes described above demonstrate that this transgenic animal is a 
suitable model for Alzheimer's disease. 

13, Furthermore, in addition to the transgenic rat lines #318 and #72, which were generated 
in the SHR genetic background, the same DNA construct was introduced into the Wistar 
rat genetic background. The transgenic rat line in the Wistar background exhibited the 
same neurofibrillary pathology phenotype as the transgenic rat lines in the SHR 
background. This result indicates that the observed phenotype is associated with the 
expression of the truncated tau protein and not with the genetics of any particular rat line. 

14, As noted in the specification (p. 12, first foil paragraph) and in the Zilka reference (p. 
3582, coL 2) the Thy-1 promoter was used for the expression of truncated tau. This 



25713035.1 



-5- 



Thy-1 promoter is derived from mice and therefore, the constructs would be expected to 
work for a mouse model in addition to the rat models already tested. Furthermore, 
sequencing of the rat genome has reveaJed a high homology between genomes of the rat 
and the mouse. See Exhibit 4 (Rat Genome Sequencing Project Consortium, Nature 
428:493-521 (2004), particularly Fig. 7)> and tau protein exhibits high phylogenetic 
conservation across a variety of species. There are examples in neurobiology showing 
that the identical or very homologous gene constructs are responsible for very similar 
phenotypes in transgenic animals of different species. For example, expression of 
mutated SOD in rats and mice have produced a very similar phenotype (Gurney et al, 
Science, 264:1772-1774 (1994) (Exhibit 5); Howland et aL PNAS, 99(3);1604-1609 
(2002) (Exhibit 6)). As a further example, similar results were obtained in modeling 
Huntington disease in mice and rats (von Horsten et qL, Human Molecular Genetics y 
12(6):617-624 (2003) (Exhibit 7); Bates et al> Human Molecular Genetics, 6(1Q):1633- 
1637 (1997) (Exhibit 8); Mangiarini et al, Cell, 87(3):493-506 (1996) (Exhibit 9)). In 
view of these observations, it can be expected that the same phenotype as observed in the 
transgenic rat can also be observed in transgenic mice expressing the same gene 
construct. 
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15. I hereby declare that all statements made herein of my knowledge are true and that all 
statements made on information and belief are believed to be true; and farther thai these 
statements were made with the knowledge that willful false statements and the like so made 
are punishable by fine or imprisonment, or both, under Section LOO) of Title ]8 of the 
United States Code and that such willful false statements may jeopardise the validity of the 
application or any patent issued thereon. 



Date 
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June 1995 PhD Slovak Academy of Sciences, Bratislava, Slovakia 

June 1986 RNDr. Comenius University, Faculty of Natural Sciences in Bratislava, Slovakia 

EMPLOYMENT: 

1996-pres Senior scientist - Institute of Neuroimmunology, Slovak Academy of Sciences, 
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2001 - pres Senior scientist - Axon Neuroscience GmbH, Vienna, Austria 

1986- 1996 Research assistant, Institute of Experimental Endocrinology, Slovak Academy of 
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Abstract Truncated tap protein is the characteristic feature of 
humau sporadic Alzheimer's disease. We have identified trun- 
cated tau proteins conforms ( tonally different from normal 
healthy tau. jSubpopulntions of these structurally different tau 
species promoted abnormal microtubule assembly in vitro sug- 
gesting toxic] gain of function. To validate pathological activity 
in yiyo we expressed active form of human truncated tau protein 
as tronsgeneJ in the rat brain. Its neuronal expression led to the 
development of the neurofibrillary degeneration of Alzheimer's 
type. Furthermore, biochemical analysis of neurofibrillary 
changes revealed that massive sarcosyj insoluble tan complexes 
consisted of human Alzheimer's tau and endogenons rat tau in 
ratio 1:1 including char uct eristic Alzheimer's disease (AD)-spe- 
cific proteins] (A68). This work represents first insight into the 
possible causative role of truncated tau in At> neurofibrillary 
degeneration ' 

©2006 Federation of European Biochemical Societies. Published 



by Elsevier B 



V. All rights reserved. 



Keyivords: Alzheimer's disease; Truncated tau; Micro tubule 
assembly; Neurofibrillary degeneration; Sarcosyl insoluble tan; 
Tau cascade 



1. Introduction 

Neurofibrillary structures in Alzheimer's disease arc princi- 
pally composed of hyperphosphorylatcd tan [] -4] and truncated 
forms of tau protein [2,5,6j. It has been shovvn that truncation i$ 
closely associated with Alzheimer's disease (AD)-typical 
conformational changes of the tau protein [5-10]. We have 
hypothesized that truncation could play major role in AD tau 
pathology [1 1]. This hypothesis originated from finding that 
AD-specific monoclonal antibody 423 (mAb 423) recognizes 
truncated tau species in the core of paired helical filaments 
(PHF) of Alzheimer's disease [5,6,12,13]- Furthermore, mAb 
1}CH, raised against sporadic AD-brain derived tau extracts, 
recognised alO and only those tan proteins that were truncated 
at the N-terrninus or at both, the N- and C- termini [&]. Trnn- 



"Corresponding author. Fax: +421 2 54774276. 
E-mail address- Michal.Novak@savba.sk (M. Novate), 

1 These authors contributed equally. 

Abbreviations? AD. Alzheimer's disease; mAb. monoclonal antibody; 
NFT, neurofibrillary tangle; NT, neuropil threads; OD, optical dens- 
ity; PHF. paired helical filaments: SDS-PAGE, sodium dodecy] sulf- 
ate- polyaerylamide gel electrophoresis 

0014-5793/£32.po © 2006 Federation of European Biochemical Societies. 
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cated tau proteins from sporadic cases of huroan AD, recognized 
by mAb DCl I ("Tan DCl 1 slate"), were further tested in vitro 
for their potency to promote microtubuie assembly. The sub- 
populations of these truncated tau species induced abnormal 
microtubule assemblies, suggesting toxic gain of function. In or- 
der to elucidate the role of truncated tan in AD Uiu cascade we 
used truncated tau that was the most active in promotion of 
abnormal microtubule assembly, as a transgeue in the rat brain. 
Rats displayed massive neurofibrillary structures Induced by ex- 
pressed human truncated tau. This is for Ihc first time shown that 
truncated human tau could serve as a driving force in neurode- 
generation of AX> type in vivo. 



Xr Materials and methods 

2.1. Preparation, expression ami purification of rail proteins 

The preparation of cDNA coding for human tau isoforms and trun- 
cated tau proteins was described elsewhere All DNA constructs 
were cloned in p£T17 vector (Novagen) through NdcI-EcoTKl restric- 
tion sites. Integrity of each constmcr was verified by DNa sequence 
analysis (ABl Prism 377DNA Sequencer, Perkin-EImer). Tau proteins 
were expressed in Escherichia coli and purified from bacteria] lysates by 
ion-exchange chromatography [14]. The protein concentration was 
determined by BCA kit (Pierce, USA). 

2.2. Microtubule assembly 

Tubulin for microtubule assembly assay was isolated from pig 
brains, using reversible assembly purification method (15J, Assay rruA- 
tures contained 1 mg/ml tubulin, 1 raM GTP, recombinant tau pro- 
teins f0.2mg/ml) in assembly buffer (100 mM Pipes, pH 6.9; I mM 
MgSO* and 2 mM EGTA). After gentle and rapid mixing, the samples 
were pipetted Into quart* micTocuvcUes and equilibrated at 37 °C in a 
thermostatically controlled spectrophotometer (Bcckman Coulter). 
The turbidity was continuously monitored at 340 ran for a period or 
5 min. For electron microscopy samples were fixed wjih 1% glutaraldc- 
hydc, put on the formvar/carbou coated 400 mesh copper grid (Agar 
Scientific UK) and stained with 1% aqueous uranyl acetate* 

2.5. Preparation oftransgene construct ana' generation of transgenic rats 
The transgene construct was prepared by ligation of a cDNA coding 
for human tau protein truncated at amino acid positions 151-391. into 
the mouse Thy- 1 gene downstream or the brain promotei'/enbancer se- 
quence. The original Thy-I gene sequence coding for exons H-IV, to- 
gether with thymus enhancer sequence was replaced by the cDNA 
Transgenic DNA was linearized by cleavage wich JjcoRI, Vector se- 
quences were removed prior to microinjection. Transgenic rats were 
generated by pronudcar injection of one-day old SHR nil embryost. 
Founders were double screened by PGR using Thy- 1 -specific and hu- 
man tau-specine primers amplifying START and STOP codon flank- 
ing sequences. The rat endogenous tau sequence was used as an 
internal amplification control. Two independent transgenic founder 

Published by Elsevier B.V. AU rights reserved. 
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lines (#318 and #72) Lhat Mably expressed human truncated lau were 
engineered and displayed similar pbenotype. The Studies described be- 
low were performed using the line #318. The expression level of total 
human truncated tail in the transgenic ruts was determined by Western 
blot analysis using protein extracts from different areas of brain und 
spinal cord. 

2.4. ftfonochrtut antibodies 

ITT7 (Innogcnctics,, Belgium) recognizes residue* J 59— 1 63 of human 
tuu, ATS (Innogcnctics) recognizes phosphoserine 202 and 205, ATI NO 
(Innogenetics) recognizes phosphothreonine 231, PHF 1 (a kind gift 
from Dr. Pcicr Pavies) recognizes phoSphoaerinc 396 and 404. As n 
control wc have used pa a tau antibody DC25 recognising residues 
347 354 (Axon Neuroacience, Austria). 

2.5. Hplolvgy und immunnh't^tQchemistry 

Ani-nals were perfused traoScardially with 4% paraformaldehyde in 
0,1 M phosphate buffered saline, pK 7,2, and the tissues were post-fixed 
after perfusion and then cut on cry o tome or embedded in paraffin and 
cut orji microtome. Immuoiom'stochemiRtry and tustopathology were 
performed on 50 um free-floaiing and 8 um paraffin embedded sec- 
tions. Tissue sectioni; were irmnunostftined using the Standard avidin- 
biotin-peroxidasc method, The modified Gallyas silver iodide, Congo 
red and Thioflavin $ staining methods were utilized to demonstrate 
mature neurofibrillary pathology in neurons [16,1 7j, Sections were 
examined with an Olympus BX51 and Zeiss Axiovcn 200 microscopes. 

2.6. Stereofogkot analyst's 

The) quantified parameters were neuronal and neurofibrillary tangle 
(NFT) density. The left brain stems of 10 months old transgenic males 
were sj^tioucd on cryostat in the frontal plane. The rostral part of the 
gigantpccllular reticular nucleus was selected as a representative region 
of the reticular formation of the brain stem. NFTs were immunohisto- 
chemitally visualized using IllAb ATS and thereafter the sections were 
countcrstaincd with cresyl violeL The optical director principle was ap- 
plied [|l8], particles (neurons, NFTs) were counxed and mimc-rical den- 
sities per mm 3 were calculated. The obtained results were corrected to 
lie ndmber weighted final section thickness [\ 9} to eliminate any pos- 
sible bias in the data due to shrinkage of the sections during histolog- 
ical processing. The study was realized with die aid of a computer- 
based srcreological system (Stcrcolnvestigaioi\ MicxoBrightField, 
USA), 
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2.7. Extraction of xarcosyl insoluble lau 

insoluble Uiu was isolated from brain tissues of 3-12 
Old rats based On the modified method of Greenberg and Da- 
Approximately 2 g of brain tissue was homogenized in 10 vol- 
buffer (lOmM Tris, 0.B M NaCl. 1 mM EGTA and 10% 
). pH 7.4) and centrifuged at 27200 x £ for 20 min. The superna- 
adjusted to 1% (w/v) -/V-lauroylsarcosiae and incubated 1 h at 
the incubation supernatan t was spun at 123000 x jr for 1 h at 
Rlesulred pellet was resuspended in small volume of phosphate- 
red saline and analysed in Western blot and throughout the paper 
is designated as P2. 



? [20] 



After t 



2.8. Western blotting 

Sarcjosyl hisoluble tau proteins purified from brains were analyzed 
on 5-20% SDS-PAQE gradient gel and Western blot as described pre- 
viously [14]. Enhanced chemnumhleSCence developed Western blot was 
tligitalized with LAS3000 CCD imaging system (Fujiftlm. Japan). Den- 
sitometry data analysis and relative quantification of Western blot re- 
cord were performed by AID A fciopackage (Ray lest, Germany) as 
described [14]. 



3. Results 

3.L Truncated tau protein (0 51-391) induces abnormal 
assembly of microtubules in vitro 
Monoclonal antibody DC11, raised using AD brain derived 
truncated forms of tau proteins, Tecognizcs "Tuu DO 1 state" 



that represents all and only those truncated tau proteins that 
are conformational^ different from normal healthy lau pro- 
teins (Fig. 1A-C). The effect of these Truncated tau proteins 
on the assembly of microtubules was analyzed. The physiolog- 
ical function of healthy tau is characterized mainly by promo- 
tion of microtubule assembly. The tau efficiency in promotion 
of microtubule assembly can be measured by increase in opti- 
cal density at 340 nm. DC* 1 positive truncated tau species, ex- 
cept t99-44i, displayed significantly higher microtubule 
assembly promotion activity than normal healthy tau. Short 
ammo-terminal truncation (t99-441) produces no measurable 
difference from normal tau. Strikingly N- and C-tcrmiualJy 
truncated tau species are promoting robust microtubule assem- 
bly, 3-4 times higher (OD^U-l.e) than normal healthy tau 
COD 34 o:0.4) (Pig. ID). For electron microscopy analysis of 
microtubule assembly was selected mAb DC) 1 positive double 
truncated tau species (tI51-391) and normal healthy tau 
(tl-441). Electron micrographs show that normal tau induces 
formation of thin rotctotubular networks (Kg. IE), However, 
interaction of truncated tau species (tl5l-39l) with tubulin 
produces abnormaiJy thick micro lubular networks (bundles). 
(Fig. IF), different in their appearance from normal microtu- 
bules under the same magnification (3600x). 

3.2. AD-Iike neurofibrillary pathology induced by truncated tau 
(U5]-$91) in vivo 

To validate suggested pathological function Oftruncated tau 
in vivo, we generated transgenic rat that overexprcsscd trun- 
cated tau (tl 5 1-391) in Ihc brain and spinal cord (Fig. 2). 
The most prominent his topatho logical feature of transgenic 
rats was extensive argyrophilic NFT formation (Fig. 3A). No 
neurofibrillary pathology was found in wild type rats through- 
out their lifespan- The appearance of NFTs satisfied several 
histological criteria used to identify neurofibrillary degenera- 
tion in the hitman AD including argyrophilia (Fig. 3A), Congo 
red birefringence (Fig. 3B) and Thioflavin $ reactivity 
(Fig. 3C). 

The load of neurofibrillary pathology was stcrcologically 
quantified in the brain stem (gigantocellular reticular nucleus) 
where the mean NFT density was 690/mm 3 with an observed 
coefficient of variation of 32.9% (Fig. 3D). The mean NFT: 
neuron ratio was li8 in transgenic animals. Furthermore, 
immunohistochemical analysis revealed that neurofibrillary 
tangle formation passed through the histologically well-defined 
maturation stages. The first stage was characterized by intra- 
neuronal pre-tanglcs, immuno reactive for phosphorylated lau 
protein. The antibody AT8 detected the diffuse rod-like phos- 
pho-tau accumulations within the cytoplasm, The pre«tangfe 
bearing neurons had detectable nuclei and normal appearance 
(Fig. 3E). The assembly of pre-tanglcs resulted in formation of 
intracellular NFTs in ceil bodies (Fig. 3F) and in processes as 
neuropil threads (NTs). The late developmental stage repre- 
sented extra-neuronal "ghost" tangles (eNFT) that were pres- 
ent as immunorcacttve, densely packed tau fibrils or bundles 
outside the neurons (Fig. 3G). The cell soma revealed no stain- 
able cytoplasm and nucleus. 

3.3. Thesarcwyl insoluble tau complexes consisted of human 
truncated tau and endogenous rat tau protein 

To determine whether truncated tau (I 151-391) was able to 
induce maturation of neurofibrillary pathology, manifested 
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Fig. U Efficacy of truncated tau proteins in promotion of microtubule assembly. (A) Schematic diagram of tau species tested in vitro Tor their 
potency to promote microtubule assembly. The numbering of amino acids corresponds to that of the human tan 40 [30] (B C) Western bJo.r analysis 
ot tau proteins using pan tau mAb DC25 and tav conformation speciOc mAb DCll. Recombinant human tau six iaoforms «5i) were used as u 
control. Monoclonal antibody DC25 (£) recognizes al] tau proteins, however conformation-dependent mAb DCI i (Q stains only truncated forms of 
tau i protein* and docs not recognize any of six human lau isoforms. (D) Microtubule assembly induced by truncated tau proteins monitored by 
turbidomctry at OP 340 lim at 5 mln. Individual bars reflect efficacy of tau species tested in promotion of microtubule assembly (E F) Electron 
microscopy images of microtubules induced by normal tau (£) and truncated tau (F). Samples were taken at steady state of polymerization (at 5 min^ 
Both jhgurcs are at the same magnification I3600x). J * 
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Expression profile of human truncated tau in the brain and spinal cord of transgenic animals. (A) Pan tau monoclonal antibody DC25 was 
r staining of rat endogenous and human truncated tau ptotein in different brain regions and spinal cord. (B) Transgenic protein expression 
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by trie presence of sarcosyl insoluble tau complexes, we ana- 
lysed sarcosyl insoluble protein extracts (P2) from 10 to 12 
months old Lrunsgcnic rats, agc-rnatched control rats (wt) 
and from Alzheimer's diseased brain tissues. Western blot 
analysis of P2 fraction, from transgenic rut using pan tau 
mAb DC25, revealed similar staining pattern (0 that of hu- 
man AD brain (Fig. 4, lanes 3 and 7). Age-matched control 
wild type rats had no tau in P2 fraction (Fig. 4, lane 2), 
To investigate whether human truncated tau (tJ51-391) co- 



assembled with endogenous rat tau in transgenic rats, wc 
analyzed the P2 fraction with antibodies reactive with both 
human and rat tau (DC25), with human tau only (tin) 
and with endogenous rat tau Only (PHF1. human Alzheimer's 
truncated tau - transgene - does not Contain the PHFl epi- 
tope). Our results showed that sarcosyl insoluble P2 fractions 
from transgenic rats consisted of human tau (Fig. 4, lune 4) 
and rat endogenous iau (Fig, 4. lane 5). Phosphorylaicd 
tau immunorcactivities were detected in the same fraction 
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Fig, 3- Truncated tan induced AD-like neurofibrillary degcneru.ii on in vivo. (A) Development of extensive argyrophylic positive neurofibrillary 
tangles in 9 months old nits. High magnification or Congo red (B) and Tbioflavin $ positive (C) imraneuronal tangles showed similar appearance as 
in human AD. (D) Slercoiogicai analyses of rat brains expressing human truncated tail showed a mean neuronal density of 5703 ncurons/mm 3 
(5.E.M. m 250.2) in brain stem, The estimated NFT density in ibis brain region was fioO/nW (S.E.M. = 101,4), Ontogeny of the neuroGbrilhiry 
degencrarion in these rats is similar to thai of human Alzheimer's disease: pre-langles (E)„ intracellular tangle? (F) and extracellular tangles (G). Tool 
bars 10 pm. 
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Fig. 4. Human truncated Uiu and endogenous rat tau arc constituent 
parr? Of massive sarcosyl insoluble tau complexes. The scries of 
uJtracerjtrifugaiion and extraction steps was used to obtain sarcosyl 
insoluble fraction of tau (f>2 fraction) from brain tissues of age- 
malched wild type (wt) raxs, transgenic rats and human AD brain 
tissues. Recombinant human six tau isofurms (lane I) were used as a 
control. Wild type rats did not show any sarcosyl insoluble ran (lane 
2). Immunoblolling of P2 fraction from transgenic rats revealed 
sarcosyl insoluble complexes of tau (lane 3. mAb DC25) that were 
formed from human truncated tau (lane 4, HT7-human Lau-spccinc 
mAb), endogenous phosphoryliXted nil tau (lane 5, mAb PHF1- 
endo^cnous rat tau specific, see SccLion 3), and human and rax 
pbosphorylaced tau (lane 6. mAb AT8). P2 fraction of human AD is 
Shown as a positive control with characteristic AOS triplet (lane 7, mAb 
DC25, lane 8 T mAb PITF1) seen in rats as well (lane 5). 



(P2) with phosphorylation-dependent antibodies, AT8 and 
PHfl. AT8 phosphoscrincs 202 and 205 were present in both 
human and rat tau (Fig. 4, lanp 6"), Abnormal phosphoryla- 
tion of rat endo gen oils tau was detected by PHF1 that does 
not recognize human truncated tau (Fig. 4, lino 5). Further- 
more, it is noteworthy that the A68 triplet characteristic of 
human AD neurofibrillary degeneration (Fig. 4 a lane 8) as 
revealed by PHF1 staining w«s found in transgenic animals 
as well (Fi^. 4, lane 5). 



3.4. Quantitative analyses of hitman transgenic an<i endogenous 
rat tau in sarcosyl insoluble fraction 

We examined further the composition of ma ture sarcosyl 
insoluble tan complexes with respect to the ratio between 
the transgenic human truncated tau (t 151-391) and endoge- 
nous rat tau. Insoluble P2 fractions from 12 months old rats 
were assayed on Western blot together with three respective 
sarcosyl soluble (S2) fractious. Both soluble and insoluble 
fractions were stained with pan-tau monoclonal antibody 
DC25 (Fig. 5A) and with human tau-SpecUic monoclonal 
antibody HT7 (Fig. 5B). Data from immu nob lotted trans- 
genic humau tau in S2 fractions Were digitized and used 
as a standard for normalization of monoclonal antibodies 
staining. Both antibodies stained tau protein wkh similar 
intensities (Fig. Tau staining in P2 fraction was 

digitized as well (Fig. 5C and D) and the relative tau protein 
amount was calculated on the basis of correlation between 
amounts of proteins seen by both antibodies. The 
comparison of normalized peak areas revealed that mature 
sarcosyl insoluble xau complexes are composed of transgenic 
human truncated tan and endogenous rat tau at );l ratio 
(Fifi- 510- 

3.5. The leuel of sarcosyl insotubte formation correlates with 
lifespan of transgenic rats expressing truncated tau 

Sarcosyl insolubility of tau is generally considered to be a 
definitive transformation point of physiological tau into 
pathological form. Therefore, we analysed development of 
sarcosyl insoluble tau complexes in the brain of transgenic 
rats expressing truncated tau (U51-391). The brain tissues 
were examined at 3, 6, 9 and 12 months old animals The 
level of tau in the sarcosyl insoluble P2 fraction increased 
in an age-depend cnt manner and correlated positively with 
the development Of neurofibrillary pathology. First sarcosyl 
insoluble tan consisting exclusively of transgene - human 
truncated tau - appeared in the brain of 3 months old 
transgenic rats and persisted until the late stages of 
neurodegeneration (Fig. 6A, lanes 1, 2). The first, phosphor- 
ylation induced electropho relic mobility decrease (gel shift) 
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Fig. 5. Quantitative analyses of human transgenic and endogenous rat tau in sarcosyl insoluble fraction. The fractions containing sarcosyl insoluble 
tau (P2) were analyzed with pan tau monoclonal antibody DC25 and human tau-spccific antibody HT7. Lanes 1-3 show sarcosyl soluble fraction 
(S2) from three independent transgenic animals; lane 4 shows sarcosyl insoluble fraction (P2) from transgenic animal (A, B). Boxed off are P2 
fractions (A, B; lane 4) that were used for quantification (C, D). Integrated signals from 52 fractions (A» E: lanes 1-3) were used for construction of 
the correlation line (E). Ratio between hnuiaa transgenic and endogenous rut tau in F2 fraction is 1:1 (F). The mAb DC25 siainmu reflects the total 
amount of tau present in sarcosyl insoluble P2 pellets, whereas HT7 detects human transgenic tau only. 
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Fig, 6. Ontogenesis of sarcosyl insoluble tau complexes. (A) Ontogenesis of sarcosyl insoluble tau complexes (P2) in the "brain of 3* 6, 9 and 12 
momhs old transgenic ruls was monitored by WesLern blot analysis using pan tau mAb DC25. (H) Alzheimer's tau expression impact On lifespan of 
transgenic rats. 



of sarcosyl insoluble tau monomer was observed in 9 
months old animals (Pig. 6 A, lane 3). Mature sarcosyl insol- 
uble tau formations, characterized by the presence of tan 
species with high and low molecular weight, appeared in 
12 months old animals (pig. 6A, Jane 4), It is noteworthy 
that the stage of "mature .sarcosyl insoluble tau formation" 
correlated with death of animals expressing transgenic hu- 
man truncated tau. The lifespan of hemizygous animals 
was 10-12 months and that of homp^goles was 5-6 
months. Life expectation of wild type rats is 22-24 months. 
These results show that expression of human truncated tau 
shortens lifespan in hemizygotes by 50% and in homozygotes 
by 75% (Fig. 6B), 



4, Discussion 

During the work providing molecular proof that microtu- 
bule associated protein tau is a major (if not sole) constituent 
of paired helical filaments [2,21], It was noted that tau could be 
truncated. Molecular mapping of the epitope of monoclonal 
antibody 423, that recognizes tau protein derived from AD 
brains, revealed for the first time that tau is truncated at 
E* lJl in Alzheimer's disease [6 3 12J. Since then, truncation of 
tau was suggested by many authors as a possible seminal event 
in the pathogenesis of Alzheimer's disease [22-24]. Tt is gener- 
ally agreed that tau has to undergo significant conformational 
change(s) leading to the pathological polymerization process. 
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It has been shown that truncation could facilitate polymeriza- 
tion of tau in vitro |25-27J. Despite of these results, the role of 
Truncation and truncated tau in AJD cascade remains an open 
issue. Monoclonal antibody DC11, produced against AD 
brain derived truncated forms of tau protein, revealed the pres- 
ence of conformationally distinct forms of tau protein in AD. 
Molecular analysis of these forms showed that DC1 1 recog- 
nizes all and only those N- and Oierminaiiy truncated tau pro- 
teins that arc conformationally different from normal healthy 
tau proteins [8], The shift of tau into "DCll state" in AD 
could represent a new pathogenic entity and important step 
in neurofibrillary degeneration itself. Therefore, vvc studied 
effect of N- and C-temiinally truncated tau species, on their 
capacity to influence microtubule assembly. Striking, these 
double truncated tau species promoted robust, microtubule 
assembly, that was 3-4 times higher (OD^ 1-2-1-6) than 
microtubule assembly induced by normal healthy tau 
(OD 3 4q;(X4). Electron microscopy analysis of microtubules 
assembled by DCll tau species revealed abnormally ihiek 
microtubular networks (bundles) that differed from normal 
microtubuJar networks. These results suggest that the trun- 
cated tau has large impact on microtubule assembly in vitro 
suggesting its possible gain of altered function thai could lead 
to tau transformation into a pathological entity. 

In order to explore the possible rote of truncated tau in vivo, 
we expressed the most in vjtxo active DCll tau species (tl 51- 
391 ) as a transgene in rat brains. Transgenic animals developed 
extensive neurofibrillary pathology satisfying several histopa- 
thological criteria used for identification Of neurofibrillary 
degeneration in AD, including argyrophilia, Congo red bire- 
fringence and Thioflavm S reactivity. As in human AD, forma- 
tion of NFT in transgenic animals passed through several 
histologically denned maturation stages. Hirst stage was repre- 
sented by pre-langle formation (identified with mAb AT8) that 
is considered to be an early event in NFT development [28,29]. 
The second stage was characterized by formation of intracellu- 
lar argyrophilic NFTs in neuronal cell bodies and NTs in their 
processes. The late developmental stage in these transgenic ani- 
mals was represented by the presence Of extra -neuronal 
"ghost" tangles (eNFT), The well-defined staging in transgenic 
rats expressing truncated tau offers an opportunity to study the 
neurodegenerative cascade of tau protein in vivo. 

In human sporadic AD. mature neurofibrillary degeneration 
is characterized by extensive formation of sarcosyl insoluble 
tau protein complexes consisting of abnormally hyperphos- 
phorylated full length and truncated tau forms [I-4J. The anal- 
y$is of sarcosyl insoluble tau fractions derived from the brain 
of transgenic animals allowed drawing several important 
conclusions: First, sarcosyl insoluble tau complexes were 
composed of transgenic human truncated tau and endogenous 
rat tau at a 1:1 ratio. Second, both human and endogenous rat 
tau were phosphorylated (AT8) and third, tau A68 triplet 
pattern characteristic of human AD [3] was formed in trans- 
genic animals. Furthermore, detailed time course experiments 
of neurofibrillary maturation revealed that first sarcosyl insol- 
uble truncated tau monomer appeared already in very yonng 
transgenic animals (3 months old), well before the detection 
of intrancuronal tangles (9 months old). We suggest that sarCO- 
syl insoluble monomer ("one band stage'*) represents imma- 
ture developmental stage of sarcosyl insoluble complex 
formations. Further " aging" of sarcosyl insoluble tau is repre- 
sented by intensive phosphorylation ("stage of shifted mono- 
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mer 7 '). Most probably the phosphorylated monomers led to 
the development of mature sarcosyl insoluble tau complexes 
("stage of tau ladder") encompassing both truncated and 
endogenous full-length tau {9-12 months old). It is intriguing 
that the stage of "mature sarcosyl insoluble tau formation"' 
correlated with the deaTb of animals expressing transgenic hu- 
man truncated tau. The life span of hemizygous animals was 
10-12 months and that of homozygotes was 5 0 months. Life 
expectation of wild type rats Is 22-24 months. Thus truncated 
lau expression shortens life span of hernizygotcs by 50% and of 
homozyfiotes by 75%. 

The present study provides experimental data introducing 
truncated tau protein as an important upstream factor in the 
pathogenesis of neurofibrillary degeneration of AD type, Xxi 
addition, our data established that truncated tau is sufficient 
to drive neurofibrillary de&eneraiion in the absence of tau 
muiaiion. 
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EXHIBIT 3 



Figure 1 




Fig,lA, Statistically significant difference was observed in the acquisition of the spatial 
information in transgenic rats relative to their non-transgenic Utter-mates at the age of 5 
months- The escape latency (A) to find the hidden platform over the 6 days (4 daily trials) of 
tiia training phase (RM-ANOVA, ** P<0.01). 

Fi&lB. Water maze visible platform acquisition during 4 testing trials over one day. Visible 
platform showed no loss of motivation or visual acuity in tested rats, 
Fig«lC, D. Significant difference in time spent in target quadrant (north) between transgenic 
rats compared to controls (t-test, *P<0,05) during the probe trial measured after three days of 
acquisition learning (Fig, 1 C). The difference in the probe trial performed after six days 
(Fig, ID) did not reach statistical significance (t-test, P=0,1), 

Fig. IE, F. Number of crosses over the platform location was significantly lower in transgenic 
rats if compared to wild type controls (t- test, *P<0.05) during the first probe trial (Fig, IB), 
The number of platform crosses during the second probe trial (Fig. IF) did not reach 
significance (t- test, *F=0,06). Values represent meaniSJB.M, 
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The ascoibate free radical electron paramagnetic resonance (EPR) signal show higher 
concentration of AFR in the homogenate obtained from brain stems of animals (5.300 nmol.g- 
1 ± 0.601 l f N=6) than from agenaaatched control rats (3.583 nmol.g-1 ± 0,3 156 N=€). 
Increased amount of AFR (P<0.01) in the brain of transgenic rats; at the terminal stage 
indicate, that oxidative stress is a consequence and not a cause of pathological cascade in the 
transgenic rats. 
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The kinetic measurement of CK reaction showed, that the rate constant values of the brain 
specific creatine kinase (CKBB) was significantly decreased in the brain of Axon transgenic 
rats. The kinetic measurement of CK reaction showed, that the rate constant values of the 
CKBB was significantly decreased (PO.05) in the brain of Axon transgenic rats (k for = 
0.2942 ± 0.01048, N=10) in comparison with age-matched controls (kfor= 0.3370 =b 
0.01 862, N-8). 
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Using mAB ATI 80 (Fig,4, right panel), strong AD-ISke phosphorylation prists in brain 
extracts from 75 and 150 day old animals. This type of phosphorylation (right panel) drives 
further development of neurofibrillary changes identical to h uman AD. None of these features 
is seen any previous tan transgenic animal using wild type or FTDP17 mutated tau trans gene 
construct The left panel of Fig. 4 shows the typical phosphorylation pattern of endogenous 
tau in embr^e^ 

remains phosphorylated thus reflecting distinct mechanisms of tau phosphorylation. 
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Genome sequence of the Brown 
Norway rat yields insights into 
mammalian evolution 

Rat Genome Sequencing Project Consortium* 

^ Lists of participants and affiliations appear at the end of the paper 

The laboratory rat [Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made 
inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The 
sequence represents a high-quality 'draft 1 covering over 90% of the genome. The BN rat sequence is the third complete mammalian 
genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian 
evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, 
comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruc- 
tion of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage- 
independent evolutionary events such as expansion of gene families, orthology relations and protein evolution. 



Darwin believed that "natural selection will always act very slowly, 
often only at long intervals of time" 1 . The consequences of evolution 
over timescales of approximately 1,000 millions of years (Myr) and 
75Myr were investigated in publications comparing the human 
with invertebrate and mouse genomes, respectively 2,3 . Here we 
describe changes in mammalian genomes that occurred in a shorter 
time interval, approximately 12-24 Myr (refs 4, 5) since the com- 
mon ancestor of rat and mouse. 

The comparison of these genomes has produced a number of 
insights: 

• The rat genome (2.75 gigabases, Gb) is smaller than the human 
(2.9 Gb) but appears larger than the mouse (initially 2.5 Gb (ref. 3) 
but given as 2.6 Gb in NCBI build 32, see http://www.ncbi.nlm. 
nih.gov/genome/seq/NCBIContiglnfo.html). 

• The rat, mouse and human genomes encode similar numbers of 
genes. The majority have persisted without deletion or duplication 
since the last common ancestor. Intronic structures are well 
conserved. 

• Some genes found in rat, but not mouse, arose through expansion 
of gene families. These include genes producing pheromones, or 
involved in immunity, chemosensation, detoxification or 
proteolysis. 

• Almost all human genes known to be associated with disease have 
orthologues in the rat genome but their rates of synonymous 
substitution are significantly different from the remaining genes. 

• About 3% of the rat genome is in large segmental duplications, a 
fraction intermediate between mouse (1-2%) and human (5-6%). 
These occur predominantly in pericentromeric regions. Recent 
expansions of major gene families are due to these genomic 
duplications. 

• The eutherian core of the rat genome — that is, bases that align 
orthologously to mouse and human — comprises a billion nucleo- 
tides (~40% of the euchromatic rat genome) and contains the vast 
majority of exons and known regulatory elements (1-2% of the 
genome). A portion of this core constituting 5-6% of the genome 
appears to be under selective constraint in rodents and primates, 
while the remainder appears to be evolving neutrally. 

• Approximately 30% of the rat genome aligns only with mouse, a 
considerable portion of which is rodent-specific repeats. Of the 
non-aligning portion, at least half is rat-specific repeats. 

• More genomic changes occurred in the rodent lineages than the 
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primate: (1) These rodent genomic changes include approximately 
250 large rearrangements between a hypothetical murid ancestor 
and human, approximately 50 from the murid ancestor to rat, and 
about the same from the murid ancestor to mouse, (2) A threefold- 
higher rate of base substitution in neutral DNA is found along the 
rodent lineage when compared with the human lineage, with the 
rate on the rat branch 5-10% higher than along the mouse branch. 
(3) Microdeletions occur at an approximately twofold-higher rate 
than microinsertions in both rat and mouse branches, 
• A strong correlation exists between local rates of microinsertions 
and microdeletions, transposable element insertion, and nucleotide 
substitutions since divergence of rat and mouse, even though these 
events occurred independently in the two lineages. 

Background 
History of the rat 

The rat, hated and loved at once, is both scourge and servant to 
mankind. The "Devil's Lapdog" is the first sign in the Chinese 
zodiac and traditionally carries the Hindu god Ganesh 6 . Rats are a 
reservoir of pathogens, known to carry over 70 diseases. They are 
involved in the transmission of infectious diseases to man, including 
cholera, bubonic plague, typhus, leptospirosis, cowpox and hanta- 
virus infections. The rat remains a major pest, contributing to 
famine with other rodents by eating around one-fifth of the world's 
food harvest. 

Paradoxically, the rat's contribution to human health cannot be 
overestimated, from testing new drugs, to understanding essential 
nutrients, to increasing knowledge of the pathobiology of human 
disease. In many parts of the world the rat remains a source of 
meat. 

The laboratory rat (R. norvegicus) originated in central Asia and 
its success at spreading throughout the world can be directly 
attributed to its relationship with humans 7 . J. Berkenhout, in 
his 1769 treatise Outline of the Natural History of Great Britain, 
mistakenly took it to be from Norway and used R. norvegicus 
Berkenhout in the first formal Linnaean description of the species. 
Whereas the black rat (Rattus rattus) was part of the European 
landscape from at least the third century ad and is the species 
associated with the spread of bubonic plague, R. norvegicus probably 
originated in northern China and migrated to Europe somewhere 
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around the eighteenth century 8 . They may have entered Europe 
after an earthquake in 1727 by swimming the Volga river. 

The rat in research 

R. norvegicus was the first mammalian species to be domesticated 
for scientific research, with work dating to before 1828 (ref. 9). The 
first recorded breeding colony for rats was established in 1856 
(ref. 9). Rat genetics had a surprisingly early start. The first studies 
by Crampe from 1877 to 1885 focused on the inheritance of coat 
colour 10 . Following the rediscovery of Mendel's laws at the turn of 
the century, Bateson used these concepts in 1903 to demonstrate 
that rat coat colour is a mendelian trait 10 . The first inbred rat 
strain, PA, was established by King in 1909, the same year that 
systematic inbreeding began for the mouse 10 . Despite this, the 
mouse became the dominant model for mammalian geneticists, 
while the rat became the model of choice for physiologists, nutri- 
tionists and other biomedical researchers. Nevertheless, there are 
over 234 inbred strains of R, norvegicus developed by selective 
breeding, which 'fixes' natural disease alleles in particular strains 
or colonies 11 . 

Over the past century, the role of the rat in medicine has 
transformed from carrier of contagious diseases to indispensable 
tool in experimental medicine and drug development. Current 
examples of use of the rat in human medical research include 
surgery 12 , transplantation 13 " 15 , cancer 16,17 , diabetes 18,19 , psychiatric 
disorders 20 including behavioural intervention 21 and addiction 22 , 
neural regeneration 23,24 , wound 25,26 and bone healing 27 , space 
motion sickness 28 , and cardiovascular disease 29-31 . In drug develop- 
ment, the rat is routinely employed both to demonstrate therapeutic 
efficacy 15,32,33 and to assess toxicity of novel therapeutic compounds 
before human clinical trials 34 " 37 . 

The Rat Genome Project 

Over the past decade, investigators and funding agencies have 
participated in rat genomics to develop valuable resources. Before 
the launch of the Rat Genome Sequencing Project (RGSP), there 
was much debate about the overall value of the rat genome sequence 
and its contribution to the utility of the rat as a model organism. 
The debate was fuelled by the naive belief that the rat and mouse 
were so similar morphologically and evolutionarily that the rat 
sequence would be redundant. Nevertheless, an effort spearheaded 
by two NIH agencies (NHGRI and NHLBI) culminated in the 
formation of the RGSP Consortium (RGSPC). 

The RGSP was to generate a draft sequence of the rat genome, 
and, unlike the comparable human and mouse projects, errors 
would not ultimately be corrected in a finished sequence 38 . Conse- 
quently, the draft quality was critical. Although it was expected to 
have gaps and areas of inaccuracy, the overall sequence quality had 
to be high enough to support detailed analyses. 

The BN rat was selected as a sequencing target by the research 
community. An inbred animal (BN/SsNHsd) was obtained by 
the Medical College of Wisconsin (MCW) from Harlan Sprague 
Dawley. Microsatellite studies indicated heterozygosity, so over 13 
generations of additional inbreeding were performed at the MCW, 
resulting in BN/SsNHsd/Mcwi animals. Most of the sequence data 
were from two females, with a small amount of whole genome 
shotgun (WGS) and flow-sorted Y chromosome sequencing from 
a male. The Y chromosome is not included in the current 
assembly. 

A network of centres generated data and resources, led by the 
Baylor College of Medicine Human Genome Sequencing Center 
(BCM-HGSC) and including Celera Genomics, the Genome Thera- 
peutics Corporation, the British Columbia Cancer Agency Genome 
Sciences Centre, The Institute for Genomic Research, the University 
of Utah, the Medical College of Wisconsin, The Children's Hospital 
of Oakland Research Institute, and the Max Delbruck Center for 
Molecular Medicine, Berlin. After assembly of the genome at the 
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BCM-HGSC, analysis was performed by an international team, 
representing over 20 groups in six countries and relying largely on 
gene and protein predictions produced by Ensembl, 

Determination of the genome sequence 
Atlas and the 'combined' sequencing strategy 

Despite progress in assembling draft sequences 2,3,39 " 14 the question 
of which method produces the highest- quality products is unre- 
solved. A significant issue is the choice between logistically simpler 
WGS approaches versus more complex strategies employing bac- 
terial artificial chromosome (BAC) clones 45 " 48 . In the Public Human 
Genome Project 2 a BAC by BAC hierarchical approach was used and 
provided advantages in assembling difficult parts of the genome. 
The draft mouse sequence was a pure WGS approach using the 
ARACHNE assembler 3,49,50 but underrepresented duplicated 
regions owing to 'collapses' in the assembly 3,51 " 53 . This limitation 
of the mouse draft sequence was tolerable owing to the planned full 
use of BAC clones in constructing the final finished sequence. 

The RGSPC opted to develop a 'combined' approach using both 
WGS and BAC sequencing (Fig. 1). In the combined approach, 
WGS data are progressively melded with light sequence coverage of 
individual BACs (BAC skims) to yield intermediate products called 
'enriched BACs' (eBACs). eBACs covering the whole genome are 
then joined into longer structures (bactigs). Bactigs are joined to 
form larger structures: superbactigs, then ultrabactigs. During this 
process other data are introduced, including BAC end sequences, 
DNA fingerprints and other long-range information (genetic mar- 
kers, syntenic information), but the process is constrained byeBAC 
structures. 

To execute the combined strategy we developed the Atlas software 
package 54 (Fig. 1). The Atlas suite includes a 'BAC-Fisher' com- 
ponent that performs the functions needed to generate eBACs. 
WGS genome coverage was generated ahead of complete BAC 
coverage, so a BAC-Fisher web server was established at the 
BCM-HGSC to enable users to access the combined BAC and 
WGS reads as each BAC was processed (see Methods for data 
access). Each eBAC is assembled with high stringency to represent 
the local sequence accurately, and so provide a valuable intermedi- 
ate product that assists all users of the genome data, Additional Atlas 
modules joined eBACs and linked bactigs to give the complete 
assembly (Fig. 1). Overall, the combined approach takes advantage 
of the strengths of both previous methods, with few of the 
disadvantages. 

Sequence and genome data 

Over 44 million DNA sequence reads were generated (Table 1; 
Methods). Following removal of low-quality reads and vector 
contaminants, 36 million reads were used for Atlas assembly, 
which retained 34 million reads. This was 7X sequence coverage 
with 60% provided by WGS and 40% from BACs. Slightly different 
estimates came from considering the entire 'trimmed' length of the 
sequence data (7.3X), or only the portion of Phred20 quality or 
higher (6.9X). 

The sequence data were end-reads from clones either derived 
directly from the genome (insert sizes of <10kb, 10 kb, 50 kb and 
>150kb) or from small insert plasmids subcloned from BACs. 
Overall, these provided 42-fold clone coverage, with 32-fold cover- 
age having both paired ends represented. Approximately equal 
contributions of clone coverage were from the different categories. 

Over 2 1 ,000 BACs were used for BAC skims ( 1 .6X coverage) with 
an average sequence depth of 1.8X, giving an overall 2.8X genomic 
sequence coverage from BACs. This was slightly more than the 
most efficient procedure would require (~ 1.2X each), because the 
genome size was not known at the project start. 

Simultaneous with sequencing, 199,782 clones from the 
CHORI-230 BAC library 55 were fingerprinted by restriction enzyme 
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Figure 1 The new 'combined' sequence strategy and Atlas software, a, Formation of 
'eBACs'. The RGSP strategy combined the advantages of both BAC and WGS sequence 
data 54 . Modest sequence coverage M .8-fold) from a BAC is used as 'bait' to 'catch' 
WGS reads from the same region of the genome. These reads, and their mate pairs, are 
assembled using Phrap to form an eBAC. This stringent local assembly retains 95% of the 
'catch', b, Creation of higher-order structures. Multiple eBACs are assembled into bactigs 



'Bactigs' > 1 Mb 



Superbactigs' 
Further joining: FPC, paired-end reads, other data 



'Ultrabactigs' 



Align to chromosomes 

based on sequence overlaps. The bactigs are joined into superbactigs by large clone 
mate-pair information (at least two links), extended into ultrabactigs using additional 
information (single links, FPC contigs, synteny, markers), and ultimately aligned to 
genome mapping data (radiation hybrid and physical maps) to form the complete 



digestion, representing 12-fold genomic coverage 56 (Methods). 
These were assembled into a 'fingerprint contig (FPC)' map (a 
contig is a set of overlapping segments of DNA) containing 11,274 
FPCs. BAC selection for sequence skimming was based on overlaps 
between BACs using FPC mapping 56 (M.K. and C.F., unpublished 
work), ongoing BAC end sequencing (S.Z., unpublished work), and 
BAC sequence skimming 57 . This strategy led to the sequence of a 
tiling path of BAC clones, covering the whole genome. In addition 
to the FPC map, a yeast artificial chromosome (YAC)-based physical 
map was constructed. 5,803 BAC and PI -derived artificial chromo- 
some (PAC) clones from RPCI-32 and RPCI-31 libraries 55 , respect- 
ively, were anchored to 51,323 YAC clones originating from two 
tenfold- coverage YAC libraries 58,225 assembled into 605 contigs 56 . 
This map was subsequently integrated with the FPC map and the 
sequence assembly, reducing the total number of map contigs to 376 
(minimum length of contig containing the 'typical' nucleotide, 
N 5Q = 172 clones, 4.4 Mb; 358 anchored to the sequence assembly; 
Supplementary Information). 

The combined strategy enabled development of resources such as 
the FPC map, BAC end sequences, and BAC skim sequences in 
parallel, rather than sequentially. In addition to allowing ongoing 



quality checking, this permitted the data-gathering phase of the 
project to be completed in less than two years. 

Atlas assembly 

Statistics for the Rnor3.1 assembly are in Table 2. Contigs within 
eBACs were ordered and oriented using read-pair information. 
Read-pair information was also used to add WGS reads to eBACs, 
even when sequence overlaps could not be reliably detected owing to 
repeated sequences. BAC skim reads with repeats were included in 
the assembly of eBACs because they clearly originated within BAC 
insert sequences. Over 19,000 eBACs were eventually generated. 

More than 98% of eBACs were successfully merged to form 
bactigs (Fig, 1 ). Bactigs were subsequently reassembled to process all 
reads from overlapping BACs simultaneously, and then ordered and 
oriented with respect to each other using FPC map and BAC end 
sequence read-pair information. These superbactig and ultrabactig 
structures (see below) were aligned with chromosomes using 
external information, such as positions of genetic markers. Ultra- 
bactigs represented the largest sequence units used to build 
chromosomes. 

The current release of the rat genome assembly, version Rnor3.1, 



Table 1 Clones and reads used in the RGSP 

Insert size* (kb) Source or vector Reads (millions) Bases (billions) Sequence coverage! Clone coverage^ 







All§ 


Used 


Paired 


Assembled 


Trimmed 


sPhred20 


Trimmed 


&Phred20 




2-4 


Plasmid 


9.6 


8.6 


7.4 


7.9 


4.8 


4.5 


1.8 


1.6 


3.70 


4.5-7.5 


Plasmid 


4.5 


4.3 


3.6 


3.6 


2,4 


2,3 


0.87 


0.82 


2.96 


10 


Plasmid 


8.4 


7.2 


6.4 


6.4 


4,1 


3.8 


1.6 


1.4 


11.63 


50 


Plasmid 


1.7 


1.3 


1.0 


1.1 


0.69 


0,65 


0.25 


0.24 


9.47 


150-250 


BAC 


0.32 


0.31 


0.26 


0.26 


0.18 


0.16 


0.07 


0.06 


9.26 


Total WGS 




24.5 


21.7 


18.7 


19,2 


12.1 


11.3 


4.4 


4.1 


37.0 


2-5 


BAC skims 


19.6 


14.6 


13.2 


14.5 


8.0 


7.7 


2.9 


2.8 


4.8|| 


Total 




44.1 


36.3 


31.9 


33.7 


20.2 


19.0 


7.3 


6.9 


41.8 



* Grouped in ranges of sizes for individual libraries tracked to specific multiples of 0.5 kb, 

fTotal bases in used reads divided by sampled genome size including all cloned and sequenced euchromatic or heterochromatic regions. 
t Estimated as sum of insert sizes divided by sampled genome size. 

§WGS reads available on the NCBI Trace Archive as of 21 March 2003; BAC skim reads attempted at BCM-HGSC as of 12 May 2003; BAC end reads obtained directly from TIGR. 
j| Refers to coverage from 2-5 kb subclones from BACs. The BACs that were skimmed amounted to 1 .58 x clone coverage. 
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Table 2 Statistics of the RGSP draft sequence assembly 


Features* 


Number 


N50 length 


Bases Bs 


ises plus gapst 




Percentage of genome* 








(kb) 


(Gb) 


(Gb) 




















Sampled (2.78 Gb) 


Assembled (2.75 Gb) 












Bases 


Bases + gaps 


Bases 


Bases + gaps 


Anchored contigs 


127,810 


38 


2.476 


2.481 


89.1 


89.2 


90.0 


90.2 


Anchored superbactig scaffolds 


783 


5,402 


2.476 


2.509 


89.1 


90.3 


90.0 


91.2 


Anchored ultrabactigs 


291 


18,985 


2.476 


2.687 


89.1 


96.6 


90.0 


97.7 


Unanchored superbactigs, main scaffolds 


134 


1,210 


0.056 


0.062 


2.0 


2.2 


2.0 


2.3 


Unanchored ultrabactigs 


128 


1,529 


0.056 


0.069 


2.0 


2.5 


2.0 


2.5 


All superbactigs, main scaffolds 


917 


5,301 


2.533 


2,571 


91.1 


92.5 


92.1 


93.5 


Minor scaffolds 


4,345 


8 


0.033 


0.038 


1.2 


1.4 


1.2 


1.4 



"Anchored sequences are those that can be placed on chromosomes because they contain known markers. The main scaffold for each superbactig is the largest set of contigs {in terms of total contig 
sequence) that can be ordered and oriented using mate-pair links and ordering of BACs. Scaffolds that cannot be ordered and oriented with respect to the main scaffold are termed minor scaffolds, 
t Ambiguous bases (N) are counted in the gap sizes, and excluded in the base counts. 

X Computed as bases plus gaps divided by estimated genome size. Sampled genome size is based on oligonucleotide frequency statistics of unassembled WGS reads. Assembled genome size is based on 
cumulative contig sequence following assembly. 



was generated using the data in Table 1. Earlier releases (Rnor2.G7 
2.1, Methods) were used for a substantial part of the annotation 
and analysis of genes and proteins, whereas the current release 
provided the genome description. Rnor3.1 has 128,000 contigs, 
with N 5U length 38 kb — larger than the expected genomic extent of a 
mammalian gene. These sequence contigs were linked into 783 
superbactigs that were anchored to the radiation hybrid map 59 . 
These larger units had JV 50 length 5.4 Mb. Another 134 smaller 
superbactigs (N50 length 1.2 Mb) could not be anchored, presum- 
ably because they fell into gaps between markers or because they 
were in repeated regions that could not be unambiguously placed. 
From placement on the radiation hybrid map, adjacent superbactigs 
were further linked to maximize continuity of sequence if appro- 
priate read-pair mates existed or FPC suggested links. This reduced 
linked superbactigs to 419 pieces with 71 singletons. 291 ultra- 
bactigs with N 50 length of nearly 19 Mb were placed on chromo- 
somes. Orthology information with mouse and human sequences 
was also used to resolve conflicts and suggest placement of sequence 
units. Most of the 128 unplaced units were either singletons or small 
superbactigs that consisted of few clones. Thus, nearly the entire 
genome was represented in less than 300 large sequence units. 

Quality assessment 

Thirteen megabases of high-quality finished rat sequence from 
BACs were available for comparison with Rnor3.1 (Methods). 
This analysis showed that the majority of draft bases from within 
contigs were high quality (1.32 mismatches per 10 kb). This is 
essentially the accepted accuracy standard for finished sequence (1.0 
errors per 10 kb) 60 , so the overwhelming majority of contig bases are 
highly accurate. The highest frequency of mismatches occurred at 
the ends of contigs. We calculate the average size of these lower- 
accuracy regions to be 750 base pairs (bp) and they amount to less 
than 0.9% of the genome. These regions arise from misassembly of 
terminal reads due to repeated sequences. 

Few mismatches were found within contigs. Six were found 
within contigs when compared with the 13 Mb of finished sequence, 
or one case per 2.2 Mb. All were insertions or deletions and may 
represent polymorphisms. Thus, at the fine structure level, the bulk 
of sequences that make up contigs is nearly the quality of finished 
sequence. 

We judged accuracy of assembly at the chromosomal level by 
alignment with linkage maps 61 and radiation hybrid map 59 (Fig. 2). 
Thirteen markers out of 3,824 from the SHRSP X BN genetic map 
were placed on different chromosomes in the assembly and in the 
genetic map. Similarly, of the 20,490 sequence tagged sites placed on 
both the assembly and radiation hybrid (v3.4) map, 96.9% had 
consistent chromosome placement 59 . Initial alignments identified 
regions of misassembly, and these were corrected, so that in 
Rnor3.1 the maps are congruent except for possible mismapped 
markers. The distribution of assembled sequence among the chromo- 



somes and chromosome sizes in Rnor3, 1 are in Supplementary Table 
SI-2. 

Landscape and evolution of the rat genome 
Genome size 

Genomic assemblies are usually smaller than the actual genome size 
owing to under-representation of sequences affected by cloning 
bias, and sequencing and assembly difficulties. Simply equating the 
assembled genome size with the euchromatic, cloneable portion 
does not take into account heterochromatin that may be included 62 . 
We therefore estimated both an assembled genome size, scaled by 
the inverse of the fraction of features (genetic markers, expressed 
sequence tags (ESTs), and so on) found in the Rnor3,l assembly, 
and a cloneable (or sampled) genome size, which was the part of the 
genome present in the WGS reads before assembly, as measured by 
analysing the distribution of short oligomers 63 , The former may be 
an underestimate because non-repetitive, easily assembled regions 
can be enriched for known features. The latter should be an 
overestimate because there are likely to be regions (such as repeats) 
that can be cloned and sequenced, but not assembled, 

For the rat genome, the assembled and cloneable genome sizes are 
very close. Considering the fraction of the marker set successfully 
mapped to Rnor3.1 (92%), or the fraction of sequence finished 
outside the BCM-HGSC (to reduce bias) present in Rnor3.1 (91%), 
together with the assembled bases in main scaffolds (2.533 Gb, 
Table 2), we suggest a genome size of 2.75 Gb. Alternatively, analysis 
of the WGS oligomers of length 24 to 32 predicted a genome size of 
between 2.76 and 2.81 billion bases. We have used the more 
conservative value of 2.75 Gb for the rat genome size, but this is 




Physical distance (Mb) 

Figure 2 Map correspondence. Correspondence between positions of markers on two 
genetic maps of the rat (SHRSP x BN intercross and FHH x ACi intercross 61 ), on the rat 
radiation hybrid map 59 , and their position on the rat genome assembly (Rnor3.1). 
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still considerably higher (150 Mb) than the 2.6 Gb currently 
reported for the mouse draft genome sequence. A fraction of the 
size differences in these rodent genomes results from the different 
repeat content (see below); however, it is also recognized that 
segmental duplications may be under- represented in the mouse 
WGS draft sequence for technical reasons 3,51 . 

Telomeres, centromeres and mitochondrial sequence 

The rat has both metacentric and telocentric chromosomes, in 
contrast to the wholly telocentric mouse chromosomes. As expected 
from previous draft sequences, the rat draft does not contain 
complete telomeres or centromeres. Their physical location relative 
to the rat draft sequence can however be approximated; the 
centromeres of the telocentric rat chromosomes (2, 4-10 and X) 
must be positioned before nucleotide 1 of these assemblies, and 
those for the remaining chromosomes are estimated as indicated in 
Fig. 3. Several of these putative centromere positions coincide with 
both segmental duplication blocks (see below) and classical satellite 



clusters, consistent with enrichment of both of these sequence 
features in rat pericentromeric DNA. Human subtelomere regions 
are characterized by both an abundance of segmentally duplicated 
DNA and an enrichment of internal (TTAGGG) n -like sequence 
islands 64 . Approximately one-third of the euchromatic rat subtelo- 
meric regions are similarly enriched, suggesting that Rnor3. 1 might 
extend very close to the chromosome ends. 

Fragments of the rat mitochondrial genome were also propagated 
within the WGS libraries and subsequently sequenced, allowing the 
assembly of the complete 16,3 13 bp mitochondrial genome (Sup- 
plementary Information). Comparison with existing mitochondrial 
sequences in the public databases revealed variable positions total- 
ling 95 bp (0.6%) between this strain and the wild brown rat. 
Considerably more variation (2.2%) was found when compared 
with the Wistar strain: 357 bp differences over the whole genome, 
including 78 positions that are conserved in the other mammalian 
sequences. Such variation has also been reported in mouse mito- 
chondrial sequences and attributed to errors in previously 
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Figure 3 Distribution of segmenta! duplications in the rat genome. Interchromosomal 
duplications (red) and intrachromosomal duplications (blue) are depicted for all 
duplications with >90% sequence identity and >20 kb length. The intrachromosomal 
duplications are drawn with connecting blue line segments; those with no apparent 
connectors are local duplications (spaced below the figure resolution limit), p arms are on 
the left and the q arms on the right. Chromosomes 2, 4-10, and X are telocentric; the 
assemblies begin with pericentric sequences of the q arms, and no centromeres are 
indicated. For the remaining chromosomes, the approximate centromere positions were 



estimated from the most proximal STS/gene marker to the p and q arm as determined by 
fluorescent in situ hybridization (FISH) (cyan vertical lines; no chromosome 3 data). The 
'Chr Un' sequence consists of contigs not incorporated into any chromosomes. Green 
arrows indicate 1 Mb intervals with more than tenfold enrichment of classic rat satellite 
repeats within the assembly. Orange diamonds indicate 1 Mb intervals with more than 
tenfold enrichment of internal (TTAGGG) n -like sequences. For more detail see http:// 
ratparalogy.cwru.edu. 
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sequenced genomes 65 . The current sequence is very accurate, and we 
therefore favour the BN sequence as a reference for the rat 
mitochondrial genome. 

Orthologous chromosomai segments and large-scale rearrangements 

Multi-megabase segments of the chromosomes of the primate- 
rodent ancestor have been passed on to human and murid rodent 
descendants with minimal rearrangements of gene order 66-68 . These 
intact regions, which are bounded by the breaks that occurred 
during ancient large-scale chromosomal rearrangements, are 
referred to as orthologous chromosomal segments. The same 
phenomenon has occurred in the descent of the rat and mouse 
from the genome of their common murid ancestor, and we were 
able to use the human genome, and in some cases other outgroup 
data, to tentatively reconstruct the sequence of many of these 
rearrangements in these lineages. To visualize the extent of ortho- 
logous chromosomal segments, each genome was 'painted' with the 
orthologous segments of the other two species (Fig. 4) using the 
Virtual Genome Painting method (M.L.G.-G. et al, unpublished 
work; http://www.genboree.org). Inspection shows the interleaving 
of events that both preceded and occurred subsequently to the rat- 
mouse divergence. 

Comparing the three species at 1 Mb resolution, BLASTZ 69 , 
PatternHunter/Grimm-Synteny 70,71 , Pash 72 , and associated merging 
algorithms 66,72,73 produce virtually indistinguishable sets of ortho- 
logous chromosomal segments. PatternHunter and the GRIMM- 
Synteny algorithm 73 detect 278 orthologous segments between 
human and rat, and 280 between human and mouse. The mouse- 
rat comparison reveals a smaller number of segments (105) of larger 
average size. The larger number of breaks in orthologous segments 
between the human to the rodent pair is expected, because of the 
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Figure 4 Map of conserved synteny between the human, mouse and rat genomes. For 
each species, each chromosome [x axis) is a two-column boxed pane (p arm at the 
bottom) coloured according to conserved synteny to chromosomes of the other two 
species. The same chromosome colour code is used for all species (indicated below). For 
example, the first 30 Mb of mouse chromosome 1 5 is shown to be similar to part of human 
chromosome 5 (by the red in left column) and part of rat chromosome 2 (by the olive in 
right column). An interactive version is accessible (http://www.genboree.org). 



latter's closer evolutionary relationship. 

Understanding the number and timing of rearrangement events 
that have occurred in each of the three individual lineages (see tree 
in Fig. 5a) since the common primate-rodent ancestor required a 
more detailed analysis. We initially focused on the X chromosome, 
because rearrangements between the X and the autosomes are rare 74 
and its history is somewhat easier to trace completely. The X 
chromosome consists of 16 human-mouse-rat orthologous seg- 
ments of at least 300 kb in size 73 (Fig. 6a). In the most parsimonious 
scenario (found with MGR and GRIMM 75 ), these were created by 15 
inversions in the descent from the primate-rodent ancestor 
(Fig. 6b). Outgroup data from cat, cow 76 and dog 77 resolved the 
timing of these rearrangements more precisely. Most of these events 
occurred in the rodent lineage: five (or four) before the divergence 
of rat and mouse, five in the rat lineage, and five in the mouse 
lineage. At most one rearrangement occurred in the human lineage 
since divergence from the common ancestor with rodents. The 
timing of this one event was ambiguous, owing to the limited 
resolution of the outgroup data. Even given this uncertainty, it is 
clear that the large-scale architecture of the X chromosome in 
humans is largely unchanged since the primate-rodent ancestor 73 , 
whereas there has been considerable activity in the rodents. The 
assignment of the accelerated activity to the rodent branch, follow- 
ing the primate-rodent divergence, is consistent with previous 
studies at significantly lower resolution (these showed complete 
conservation of marker order between the X chromosomes of 
human and cat 78 , human and dog 77 , and human and lemur 79 , as 
well as similar karyotypes of the X chromosomes in human, 
chimpanzees, gorillas and orangutans 80 ). 

Large-scale reconstruction of the entire ancestral murid genome 
suggests that it retained many previously postulated chromosome 
associations of the placental ancestor 81,82 . The most parsimonious 
scenario we found requires a total of 353 rearrangements: 247 
between the murid ancestor and human, 50 from the murid 
ancestor to mouse and 56 from the murid ancestor to rat. A recent 
study 82 implies that most of the 247 rearrangements between the 
murid ancestor and human occurred on the evolutionary subpath 
from the squirrel-mouse-rat ancestor to the murid ancestor. Our 
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Figure 5 Substitutions and microindeis (1-10 bp) in the evolution of the human, mouse 
and rat genomes, a, The lengths of the labelled branches in the tree are proportional to the 
number of substitutions per site inferred using the REV model 222 from all sites with aligned 
bases in all three genomes, b, The table shows the midpoint and variation in these 
branch-length estimates when estimated from different sequence alignment programs 
and different neutral sites, including sites from ancestral repeats 3 , fourfold degenerate 
sites in codons, and rodent-specific sites ('in neutral sites only" row; Supplementary 
Information). Other rows give midpoints and variation for micro-indels on each branch of 
the tree in a. 
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analyses confirm that the rate of rearrangements in murid rodents is 
much higher than in the human lineage 73 . 

Segmental duplications 

Segmental duplications are defined here as regions of the genome 
that are repeated over at least 5 kb of length and >90% identity. The 
rat has approximately 2.9% of its bases in these duplicated regions 
(Fig. 3), whereas the human genome has 5-6% 83 . In contrast to the 
greater rate of large-scale rearrangement, the mouse genome shows 
substantially fewer of these events 3 , with only 1.0-2.0% 51 of its 
sequenced bases in duplicated regions. These duplicated structures 
are particularly challenging to assemble, and we attribute at least 
some of the mouse-rat differences to the BAC-based approach we 
used for Rnor3.1, compared with the WGS mouse approach. The 
vast majority of these sequences (73 of 82 Mb) were regions with 
<99.5% identity and thus were not simply overlapping sequences 
that had not been joined by the assembly program Phrap. The 
'unplaced' chromosome in Rnor3.1 showed a marked enrichment 
for blocks of segmental duplication (nearly 44% of the total), which 
indicates problems with anchoring these elements to the genome. 

Intrachromosomal duplications are represented at a three-to-one 
excess when compared with interchromosomal duplications, and 
are significantly enriched near the telomeres and in centromeric 
regions (Fig. 3). The pericentromeric accumulation of segmental 
duplications in the rat is reminiscent of that observed in human and 
mouse 83-86 , and seems to be a general property of mammalian 
chromosome architecture. 

We observed considerable clustering of duplications 87 , including 
41 discrete genomic regions larger than 1Mb in size in which 
duplications appear to be organized into groups with <100kb 



between duplicated segments. For many of these clusters, the 
underlying sequence alignments showed a wide range in the degree 
of sequence identity, suggesting that these areas have been subject to 
duplication events more or less continuously over millions of years. 
In contrast, an analysis of the evolutionary distance between all 
duplicated regions showed an unusual bimodal distribution, par- 
ticularly for intrachromosomal segmental duplications. Two peaks 
were observed at 0.045 substitutions per site and 0.075 substitutions 
per site. Given that the rat genome has accumulated 8-10% 
substitutions (see below) since the speciation from mouse 12- 
24Myr ago, this bimodal distribution may correspond to bursts 
of segmental duplication that occurred approximately 5 and 8 Myr 
ago, respectively. 

The segmental duplications in the rat genome were of consider- 
able interest because they represent an important mechanism for 
the generation of new genes. We found that 63 NCBI reference 
sequence 88 (RefSeq; see http://www.ncbi.nih.gov/RefSeq/) genes 
were located completely or partially within rat duplicated regions, 
out of a genome total of 4,532 rat RefSeq genes. As discussed below, 
many of these genes are present in multiple copies and belong to 
gene familes that have been recently duplicated and contribute to 
distinctive elements of rat biology. 

Gains and losses of DNA 

In addition to large rearrangements and segmental duplications, 
genome architecture is strongly influenced by insertion and deletion 
events that add and remove DNA over evolutionary time. To 
characterize the origins and losses of sequence elements in the 
human, mouse and rat genomes, we categorized all the nucleotides 
in each of the three genomes, using our alignment data and 
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Figure 6 X chromosome in each pair of species, a, GRIMM -Synteny 71 computes 16 
three-way orthologous segments (>300 kb) on the X chromosome of human, mouse and 
rat, shown for each pair of species, using consistent colours, b, The arrangement (order 
and orientation) of the 16 blocks implies that at least 15 rearrangement events 
occurred during X chromosome evolution of these species. The program MGR (http :// 
www.cs.ucsd.edu/groups/bioinformatics/MGR/) determined that evolutionary scenarios 



with 1 5 events are achievable and all have the same median ancestor (located at the last 
common mouse-rat ancestor). Shown is a possible (not unique) most parsimonious 
inversion scenario from each species to that ancestor. We note that the last common 
ancestor of human, mouse and rat should be on the evolutionary path between this 
median ancestor and human. 
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RepeatMasker annotations of the insertions of repetitive elements 
(Fig. 7). The rodent repeat database used by RepeatMasker was 
greatly expanded by analysing the rat and mouse genomes 89 , but it is 
clear that not all repeats are being recognized, especially the older 
ones. Thus, these estimates of the amount of rodent repeats 
represent lower bounds. 

About a billion nucleotides (39% of the euchromatic rat genome) 
align in all three species, constituting an 'ancestral core' that is 
retained in these genomes. This ancestral core contains 94-95% of 
the known coding exons and regulatory regions. Comparisons 
between the human and mouse genomes, using transposon relics 
retained in both species ('mammalian ancestral repeats') to model 
neutral evolution, have been used to estimate the fraction of the 
human genome that is accumulating substitutions more slowly than 
the neutral rate in both lineages since their divergence, and hence 
may be under some level of purifying selection 3 . Depending on 
details of methodology, such estimates have ranged between about 
4% and 7% 3,90,91 . The levels of three-way conservation observed here 
between the human, mouse and rat genomes in the ancestral core 
lend further support to these earlier estimates, giving values in the 
range of 5-6% when measured by two quite different methods (see 
Methods and ref. 92). In this constrained fraction, non-coding 
regions outnumber coding regions regardless of the strength of 
constraint 92 , an observation that supports recent comparative 




Figure 7 Aligning portions and origins of sequences in rat, mouse and human genomes. 
Each outlined ellipse is a genome, and the overlapping areas indicate the amount of 
sequence that aligns in all three species (rat, mouse and human) or in only two species. 
Non-overlapping regions represent sequence that does not align. Types of repeats 
classified by ancestry: those that predate the human-rodent divergence (grey), those that 
arose on the rodent lineage before the rat-mouse divergence (lavender), species-specific 
(orange for rat, green for mouse, blue for human) and simple (yellow), placed to illustrate 
the approximate amount of each type in each alignment category. Uncoloured areas are 
non- repetitive DNA-4he bulk is assumed to be ancestral to the human-rodent 
divergence. Numbers of nucleotides (in Mb) are given for each sector (type of sequence 
and alignment category). Detailed results are tabulated (Supplementary Table SI-1). 
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analyses limited to subsets of the genome 93,94 . The preponderance 
of non-coding elements in the most constrained fraction of the 
genome underscores the likelihood that they play critical roles in 
mammalian biology. 

About 700 Mb (28%) of the rat euchromatic genome aligns only 
with the mouse. At least 40% of this comprises of rodent-specific 
repeats inserted on the branch from the primate-rodent ancestor to 
the murid ancestor, and some of the remainder can be recognized as 
mammalian ancestral repeats whose orthologues were deleted in the 
human lineage (Fig. 7). Another part is likely to consist of single- 
copy ancestral DNA deleted in the human lineage but retained in 
rodents. Although this 700 Mb of rodent-specific DNA is primarily 
neutral, it may also contain some functional elements lost in the 
human lineage in addition to sequences representing gains of 
rodent-specific functions, including some coding exons 95 . 

The remainder of the euchromatic rat genome (726 Mb, 29%) 
aligns with neither mouse nor human (Fig. 7). At least half of this 
( 15% of the rat genome) consists of rat-specific repeats, and another 
large fraction (8% of the rat genome) consists of rodent-specific 
repeats whose orthologues are deleted in the mouse. 

Substitution rates 

The alignment data allow relatively precise estimates of the rates 
of neutral substitutions and microindel events (^10 bp). Both 
synonymous fourfold degenerate ('4D') sites in protein-coding 
regions and sites in mammalian ancestral repeats were used in 
this analysis, as in previous studies comparing human and 
mouse 3,96 . We additionally used a class of primarily neutral sites 
whose identification is made uniquely possible by the addition of 
the rat genome sequence: namely, the rodent-specific sites discussed 
above, identified by their failure to align to human sequence. 

Our estimates for the neutral substitution level between the two 
rodents range from 0.15 to 0.20 substitutions per site, while 
estimates for the entire tree of human, mouse and rat range from 
0.52 to 0.65 substitutions per site (Fig, 5). This difference was 
predictable because of the evolutionary closeness of the two rodents. 
For all classes of neutral sites analysed, however, the branch 
connecting the rat to the common rodent ancestor is 5-10% longer 
than the mouse branch (Fig. 5a). Thus, for as yet unknown reasons, 
the rat lineage has accumulated substantially more point substi- 
tutions than the mouse lineage since their last common ancestor. 

We also analysed four-way alignments including sequence from 
orthologous ancestral repeats in human, mouse and rat, along with 
the repeat consensus sequences, which approximate the sequence of 
the progenitor of the corresponding repeat family (Methods). These 
alignments allow us to distinguish substitutions on the branch from 
the primate-rodent ancestor to the rodent ancestor from substi- 
tutions on the branch descending to human 77 . This revealed an 
overall speed-up in rodent substitution rates relative to human of 
about three-to-one, larger than estimated previously 3 , but consist- 
ent with other more recent studies which also use multiple sequence 
alignments 77,97,98 . 

Estimates for rates of microdeletion events are, for all branches, 
approximately twofold higher than rates of microinsertion (Fig. 5b), 
suggesting a fundamental difference in the mechanisms that gen- 
erate these mutations. Furthermore, there are substantial rate 
differences for each class of event between the various lineages. In 
particular, the rat lineage has accumulated microdeletions more 
rapidly than the mouse, while the opposite holds true for micro- 
insertions. As with substitutions, both microinsertion and micro- 
deletion rates are substantially slower in the human lineage. The size 
distribution of microindels (1-10 bp) on the rat branch was heavily 
weighted towards the smallest indels: 45% of indels are single bases, 
18% are 2 bp, 10% are 3 bp, 8% are 4 bp, and so on, monotonically 
decreasing. Separate distributions for insertions and for deletions 
were similar, as were distributions of indel sizes on the mouse 
branch. 
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Male mutation bias 

As mouse and rat are similar in generation time and number of 
germline cell divisions 99,100 , we investigated a potential sex bias in 
different types of observed genome changes. We compared substi- 
tution and indel rates between the X chromosome and autosomes in 
ancestral repeat sites (—5 Mb and ~ 100 Mb in total for X and 
autosomes, respectively 101 ). We discovered that in rodents, small 
indels (<50bp) are male-biased, with a male-to-female rate ratio of 
~2.3. This is in contrast to a recent study in primates, based on a 
substantially smaller data set, that indicates no sex bias in small 
indels 102 . Our male-to-female nucleotide substitution rate ratio in 
rodents is —1.9, confirming earlier reports 103,104 . When substitution 
rates are compared for all sites aligned between mouse and rat 
(~78Mb and —1,691Mb, respectively), we again observe an 
approximately twofold excess of small indels and nucleotide sub- 
stitutions originating in males compared with females 101 . Interest- 
ingly, the ratio in the number of cell divisions between the male and 
female germlines is also about two 99,100 , suggesting that these 
substitutions may arise from mutations that occur primarily during 
DNA replication. 

G+C content and CpG islands 

The G+C content of the rat varies significantly across the genome 
(Fig. 8a), and the distribution more closely resembles that of mouse 
than human. The variation in G+C content is coupled with 
differences in the distribution of CpG islands^short regions that 
are associated with the 5' ends of genes and gene regulation 2,3,105 , 
and that escape the depletion of CpG dinucleotides that occurs from 
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Figure 8 Base composition distribution analysis, a, The fraction of 20 kb non-overlapping 
windows 3 with a given G+C content is shown for human, mouse and rat. b, The number 
of Ensembl-predicted genes per chromosome and the number of CpG islands per 
chromosome. The density of CpG islands averages 5.9 islands per Mb across 
chromosomes and 5.7 islands per Mb across the genome. Chromosome 1 has more CpG 
islands than other chromosomes, yet neither the island density nor ratio to predicted 
genes exceeds the normal distribution. The number of CpG islands per chromosome and 
the number of predicted genes are correlated {R 2 - 0.96). 



deamination of methylated cytosine 2,105 . The 2.6 Gb rat genome 
assembly (including unmapped sequences) contains 15,975 CpG 
islands in non-repetitive sequences of the genome. This is similar to 
the 15,500 CpG islands reported in the 2.5 Gb mouse genome 3 , but 
far fewer than the 27,000 reported in the human genome 2,3 ' 105 . 

A summary of the CpG island distributions by chromosome is 
given in Fig. 8b. Chromosome X, with a low G+C content of 37.7%, 
has the fewest islands (362) and the lowest density of islands (2.6 per 
Mb). Chromosome 12 is at the other end of the range with a G+C 
content of 43.5% and the highest density of CpG islands (11,5 
islands per Mb). This is similar to chromosome 10, with 11.3 islands 
per Mb. The average density of CpG islands is 5.7 islands per Mb 
over the whole genome and 5.9 CpG islands per Mb averaged by 
chromosome, which is similar to the distribution in mouse 3 . 
Neither rodent genome shows the extreme outliers in CpG island 
density that are seen for human chromosome 19 (ref. 2), The 
density of CpG islands in the rat genome correlates positively 
with the density of predicted genes (R of 0.96) (Fig. 8b). 

These data show that the overall changes in CpG island content 
predate the rat-mouse split and are consistent with the accelerated 
loss of CpG dinucleotides in rodents compared with humans 105,106 . 
It remains possible, however, that occurrences such as the greater 
number of human regions with extremely high G+C content are 
due to distributional changes mostly in the primate, rather than in 
the rodent lineage. 

Shift in substitution spectra between mouse and rat 

The non- repetitive fraction of the rat genome is enriched for G+C 
content relative to the mouse genome, by —0.35% over 1,3 billion 
nucleotides. This is a subtle but substantial difference that may be 
explained, at least in part, by differences in the spectra of mutation 
events that have accumulated in the mouse and rat lineages. We 
analysed all alignment columns in which substitution events can be 
assigned to either the mouse or the rat lineage, by virtue of a 
nucleotide match between human and only one rodent 92 ; note that 
this is a small minority of substitutions. Of the ~ 117 million 
alignment columns meeting this criteria, —60 million involve a 
change in the rat lineage versus —57 million in the mouse, reflecting 
the increase in rates of point substitution in the rat lineage (Fig. 5b). 
While 50% of these changes in rat involve a substitution from an 
A/T to a G/C, these events constitute only 47% of all mouse changes. 
The complementary change, G/C to A/T, exhibits relative excess in 
the mouse versus the rat lineage (38% versus 35%, respectively). No 
substantial difference between changes that do not alter G+C 
content is observed. In addition, this bias is not confined to 
particular transition or transversion events, nor can it be explained 
simply as a result of divergent substitution rates of CpG dinucleo- 
tides (data not shown). Thus, this shift appears to be a general 
change that results in an increase in G+C content in the rat genome. 
Biochemical changes in repair or replication enzymes might be 
responsible, and the observation that recombination rates are 
slightly higher in rat than in mouse 107 may suggest a role for 
G+C-biased mismatch repair 108,109 . However, population genetic 
factors, such as selection, cannot be ruled out. 

Evolutionary hotspots 

Comparison of the two rodent genomes, using human as outgroup, 
reveals regions that are conserved yet under different levels of 
constraint in mouse and rat. These regions may have distinct 
functional roles and contribute to species-specific differences. 
Analysis of the MAVID alignments 110 revealed 5,055 regions 
>100 bp, in which there was at least a tenfold difference in the 
estimated number of substitutions per site on the mouse and rat 
branches. To avoid alignment problems and fast-evolving regions, 
the analysis was restricted to regions where the human branch had 
<0,25 substitutions per site 111 . These regions are enriched twofold 
in transcribed regions: 39% of mouse hotspots were found in the 
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18% of the mouse genome covered by RefSeq genes; and 17% of the 
rat hotspots were found in the 8% of the rat genome covered by 
RefSeq genes. Similar numbers are observed when examining 
coding exon and EST regions (not shown). Half of all hotspots in 
the mouse genome lie totally in non-coding regions. Many hotspots 
are several hundred bases long, with average length 190 ± 86 bp. 
Future work aimed at identifying the genomic differences that 
contribute to phenotypic evolution may benefit from analyses 
such as these, which will become more powerful as the repertoire 
of mammalian genome sequences expands. 

Covariation of evolutionary and genomic features 

To illustrate the genomic and evolutionary landscape of a single rat 
chromosome in depth, we characterized features for rat chromo- 
some 10 at 1 Mb resolution (Fig. 9). This high-resolution analysis 
uncovered strong correlations between certain microevolutionary 
features 89,92,98 . Particularly strongly correlated are the local rates of 
microdeletion (R 2 = 0.71; Fig. 9a), microinsertion (R 2 = 0.56; Fig. 
9a), and point substitution (R 2 — 0.86; Fig. 9b) between the two 
independent lineages of mouse and rat. In addition, microinsertion 
rates are correlated with microdeletion rates {R 2 = 0.55; Fig. 9a). 
These strong correlations are also observed in an independent 
genome-wide analysis, both on the original data and after factoring 
out the effects of G+C content (not shown, see Supplementary 
Information). 

Perhaps surprisingly, substantially less correlation is seen between 
microindel and point substitution rates (compare Fig. 9a and b). 
The amount of correlation varies among chromosomes (not 



shown), but is generally weaker than the relationships mentioned 
above. Further studies will be required to determine whether local 
evolutionary pressures, which must have remained stable since the 
separation of . the mouse and rat lineages, differentially drive 
microindel and point substitution rates. 

We also find that the local point substitution rate in sites common 
to human, mouse and rat strongly correlates with that in rodent- 
specific sites (R 2 = 0.57; Fig, 9b, blue line versus red/green). These 
two classes of sites, while interdigitated at the level of tens to 
thousands of bases, constitute sites that are otherwise evolutionarily 
independent. This result confirms that local rate variation is not 
solely determined by stochastic effects and extends, at high resol- 
ution, the previously documented regional correlation in rate 
between 4D sites and ancestral repeat sites 3,96 . 

Evolution of genes 

A substantial motivation for sequencing the rat genome was to 
study protein-coding genes. Besides being the first step in accurately 
defining the rat proteome, this fundamental data set yields insights 
into differences between the rat and other mammalian species with 
a complete genome sequence. Estimation of the rat gene content is 
possible because of relatively mature gene-prediction programs and 
rodent transcript data. Mouse and human genome sequences also 
allow characterization of mutational events in proteins such as 
amino acid repeats and codon insertions and deletions. The quality 
of the rat sequence also allows us to distinguish between functional 
genes and pseudogenes. 

We estimate (on the basis of a subset) that 90% of rat genes 
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Figure 9 Variability of several evolutionary and genomic features along rat chromosome 
10. a, Rates of microdeletion and microinsertion events (less than 1 1 bp) in the mouse 
and rat lineages since their last common ancestor, revealing regional correlations, 
b, Rates of point substitution in the mouse and rat lineages. Red and green lines represent 
rates of substitution within each lineage estimated from sites common to human, mouse 
and rat. Blue represents the neutral distance separating the rodents, as estimated from 
rodent-specific sites. Note the regional correlation among all three plots, despite being 



estimated in different lineages (mouse and rat) and from different sites (mammalian 
versus rodent-specific), c, Density of SINEs inserted independently into the rat or mouse 
genomes after their last common ancestor, d, A+T content of the rat, and density in the 
rat genome of LINEs and SINEs that originated since the last common ancestor of human, 
mouse and rat. Pink boxes highlight regions of the chromosome in which substitution 
rates, A+T content and LINE density are correlated. Blue boxes highlight regions in which 
SINE density is high but LINE density is low. 
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possess strict orthologues in both mouse and human genomes. Our 
studies also identified genes arising from recent duplication events 
occurring only in rat, and not in mouse or human. These genes 
contribute characteristic features of rat-specific biology, including 
aspects of reproduction, immunity and toxin metabolism. By 
contrast, almost all human 'disease genes' have rat orthologues. 
This emphasizes the importance of the rat as a model organism in 
experimental science. 

Construction of gene set and determination of orthology 

The Ensembl gene prediction pipeline 112 predicted 20,973 genes 
with 28,516 transcripts and 205,623 exons (Methods). These genes 
contain an average of 9.7 exons, with a median exon number of 
6.0. At least 20% of the genes are alternatively spliced, with an 
average of 1.3 transcripts predicted per gene. Of the 17% single exon 
transcripts, 1,355 contain frameshifts relative to the predicted 
protein and 1,176 are probably processed pseu do genes. Of the 
28,516 transcripts, 48% have both 5' and 3' untranslated regions 
(UTRs) predicted and 60% have at least one UTR predicted. 

These gene predictions considered homology to other sequences, 
including 26,949 rodent proteins, 4,861 non-rodent, vertebrate 
proteins, 7,121 rat complementary DNAs from RefSeq and EMBL, 
and 31,545 mouse cDNAs from Riken, RefSeq and EMBL. The 
majority (61%) of transcripts are supported by rodent transcript 
evidence. When combined with additional private EST data, the 
fraction of genes supported by transcript evidence could be 
increased to 72% 113 . 

A number of other ab initio (GENSCAN 114 , GENEID' 15 ), simi- 
larity-based (FGENESH++; ref. 116) and comparative (SGP n? , 
SLAM 118 , TWINSCAN1 119-121 ) gene-prediction programs were used 
to analyse the rat genome. The number of genes predicted by these 
programs ranged from 24,500 to 47,000, suggesting coding densities 
ranging from 1.2% to 2.2%. The coding fraction of RefSeq genes 
covered by these predictions ranged from 82% to 98%. Such 
comparative ab initio programs using the rat genome were success- 
fully used to identify and experimentally verify genes missed by 
other methods in rat 121 and human 122 , The predictions of these 
programs can be accessed through the UCSC genome browser and 
Ensembl websites. 

RefSeq genes (20,091 human, 11,342 mouse and 4,488 rat) 
mapped onto genome assemblies with BLAT 123 and the UCSC 
browser revealed that the number of coding exons per gene and 
average exon length were similar in the three species. Differences 
were observed in intron length, with an average of 5,338 bp in 
human, 4,2 12 bp in mouse and 5,002 bp in rat. These differences 
were also found in a smaller collection of 6,352 confidently mapped 
orthologous intron triads (see 'Conservation of intronic splice 
signals' section below): average intron lengths in this collection 
were 4,240 bp in human, 3,565 bp in mouse and 3,638 bp in rat. 

Properties of orthologous genes 

Orthology relationships were predicted on the basis of BLASTp 
reciprocal best-hits between proteins of genome pairs (human-rat, 
rat-mouse and mouse-human) 3 (Supplementary Information). 
Using these methods and the ENSEMBL prediction sets, 12,440 



rat genes showed clear, unambiguous 1:1 correspondence with a 
gene in the mouse genome. This is an underestimate, because 
random sampling of different classes of rat genes with less stringent 
criteria for comparison to mouse always identified additional gene 
pairs. Errors arose from pseudogene misclassification, sequence 
loss, duplication or fragmentation in assemblies; and missing or 
inappropriate gene predictions, including coding-gene predictions 
from non-coding RNAs. Taking these errors into account, we 
estimate the true proportion of 1:1 orthologues in rat and mouse 
genomes to lie between 86 and 94% (Methods). The remaining 
genes were associated with lineage-specific gene family expansions 
or contractions. These overall observations are consistent with a 
careful analysis of rat proteases showing that 93% of these genes 
have 1:1 orthologues in mouse 124,125 . 

Surprisingly, a similar proportion (89 to 90%) of rat genes 
possessed a single orthologue in the human genome. Because 
human represents an outgroup to the two rodents, it was expected 
that mouse and rat would share a higher fraction of orthologues. A 
close inspection of gene relationships indicates that these findings 
may suffer from incompleteness of rodent genome sequences, 
together with problems of misassembly and gene prediction within 
clusters of gene paralogues. 

Further analysis of orthologous pairs considered the occurrence 
of nucleotide changes within protein-coding regions that reflected 
synonymous or non-synonymous substitutions. The majority of 
these studies measured evolutionary rates by determination of K A 
(number of non-synonymous substitutions per non-synonymous 
site) and K$ (number of synonymous substitutions per synonymous 
site), K A /K S ratios of less than 0.25 indicate purifying selection, 
values of 1 suggest neutral evolution, and values greater than 1 
indicate positive selection 126 . 

Evolutionary rates were first calculated from a reduced set of 
orthologue pairs that are embedded in orthologous genomic seg- 
ments and are related by conservative values of K s (Table 3) 
(Methods). A slight increase in median K s values for rat-human 
as compared with mouse-human, was found, indicating that the rat 
lineage has more neutral substitutions in gene coding regions than 
the mouse lineage. Sequence conservation values were similar to 
those previously found using smaller data sets 127,128 , and the overall 
trend is consistent with results of other evolutionary rate analyses 
discussed above (Fig, 5). 

Next, we investigated examples of rat genes shared with mouse, 
but with no counterparts in human, Such genes might be rapidly 
evolving so that homologues are not discernible in human, or they 
might have arisen from non-coding DNA, or their orthologues in 
the human lineage might have formed pseudogenes. Thirty-one 
Ensembl rat genes were collected that have no non-rodent homo- 
logues in current databases (Methods). These are twofold over- 
represented among genes in paralogous gene clusters, and threefold 
over-represented among genes whose proteins are likely to be 
secreted. This is consistent with observations 3 that clusters of 
paralogous genes, and secreted proteins, evolve relatively rapidly. 
Detailed examination of the 31 genes using PSI-BLAST determined 
that ten genes cannot be assigned homology relationships to 
experimentally described mammalian genes. These ten rodent- 



Table 3 One-to-one orthologous genes in human, mouse and rat genomes 





Human-mouse 


Human-rat 


Mouse-rat 


1:1 orthologue relationships 


1 1 ,084 


10,066 


1 1 ,503 


Median K s values" 


0.56 (0.39-0.80) 


0.57 (0.40-0.82) 


0.19(0.13-0.26) 


Median K A /Ks values" 


0.10(0.03-0.24) 


0.09(0.03-0.21) 


0.11 (0.03-0.28) 


Median % amino acid identity* 


88.0% (74.4-96.3%) 


88.3% (75.9-96.4%) 


95.0%f (88.0-98.7%) 


Median % nucleotide identity* 


85.1% (77.4-90.0%) 


85.1% (77.8-89.9%) 


93.4% (89.2-95.7%) 



Data obtained from Ensembl, Homo sapiens version 11.31 (24,841 genes). Musmusculus version 10.3 (22.345 genes), Rattus norvegicus version 1 1.2 (21,022 genes). 
*Numbers in parentheses represent the 16th and 83rd percentiles. 
tThis value is consistent with previous findings (93.9% in ref. 130). 
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specific genes may have evolved particularly rapidly, or have non- 
coding DNA homologues, or be erroneous predictions. 

The paucity of rodent-specific genes indicates that de novo 
invention of complete genes in rodents is rare. This is not unex- 
pected, because the majority of eukaryotic protein-coding genes are 
modular structures containing coding and non-coding exons, spli- 
cing signals and regulatory sequences, and the chances of indepen- 
dent evolution and successful assembly of these elements into a 
functional gene are small, given the relatively short evolutionary 
time available since the mouse-rat split. However, individual 
rodent-specific exons may arise more frequently, particularly if 
the exon is alternatively spliced 129 . Applying a K A /K$ ratio 
test 130,131 to sequences that align only between rat and mouse, we 
identified 2,302 potential novel rodent-specific exons, with EST 
support, in BLASTZ alignments of rat and mouse sequences. None 
of these individual exons matched human transcripts, but approxi- 
mately half (1,1 16) appear to be present in alternative splice forms 
found in rodents. We speculate that these exons contain the few 
successful lineage-specific survivors of the constant process of gene 
evolution, by birth and death of individual exons. 

Indels and repeats in protein-coding sequences 

In contrast to small indels occurring in the bulk of the genome 
(above), indels within protein-coding regions are probably lethal, or 
deleterious and so are rapidly removed from the population by 
purifying selection. Indel rates within rat coding sequences were 50- 
fold lower than in bulk genomic DNA 132 . The whole genome excess 
of deletions compared with insertions (Fig. 5b) was also evident in 
coding sequences. The magnitude was less, with a genome-wide 
deletion-to-insertion ratio of 3.1:1 reducing to 1.7:1 in the rat. In 
mouse this value reduced from 2.5:1 to 1.1:1 (ref. 132). These data 
suggest that deletions are —16% more likely than insertions to be 
removed from coding sequences by selection. 

Owing to the triplet nature of the genetic code, indels of multiples 
of three nucleotides in length (3„ indels) are less likely to be 
deleterious. Direct comparison of 3 n indel rates between bulk 
DNA (0.77 indels per kb for mouse, 0.83 indels per kb for rat) 
and coding sequence (0.087 indels per kb for mouse and 0.084 
indel per kb for rat) showed that 3„ indels were ninefold under- 
represented in coding sequences. At least 44% of indels were 
duplicative insertion or deletion of a tandemly duplicated sequence, 
collectively termed sequence slippage 132 . Sequence slippage con- 
tributed approximately equally to observed insertions and del- 
etions. The overall excess of deletions could be attributed 
specifically to an excess of non-slippage deletion over non-slippage 
insertion in both mouse and rat lineages 132 . Of the slippage indels, 
13% were in the context of trinucleotide repeats (n > 2, excluding 
the inserted or deleted sequence) which are known to be particularly 
prone to sequence slippage and encode homopolymeric amino acid 
tracts 133 ' 134 . 

To gain better understanding of dynamic changes in the length of 
homopolymeric amino acid tracts on gene evolution and disease 
susceptibility, we searched for other characteristics of amino acid 
repeat variation by analysing all size-five or longer amino acid 
repeats in a data set of 7,039 rat, mouse and human orthologous 
protein sequences 135 . Most species-specific amino acid repeats (80- 
90%) were found in indel regions, and regions encoding species- 
specific repeats were more likely to contain tandem trinucleotide 
repeats than those encoding conserved repeats. This was consistent 
with the involvement of slippage in the generation of novel repeats 
in proteins and extended previous observations for glutamine 
repeats in a more limited human-mouse data set 136 . 

The percentage of proteins containing amino acid repeats was 
13.7% in rat, 14.9% in mouse and 17.6% in human 135 . The most 
frequently occurring tandem amino acid repeats were glutamic acid, 
proline, alanine, leucine, serine, glycine, glutamine and lysine. 
Using the same threshold size cut-off, tandem trinucleotide repeats 
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were significantly more abundant in human than in rodent coding 
sequences, in striking contrast to the frequencies observed in bulk 
genomic sequences (29 trinucleotide repeats per Mb in rat, 32 
repeats per Mb in mouse and 13 repeats per Mb in human, see 
discussion of the general simple repeat structure below). The 
conservation of human repeats was higher in mouse (52%) than 
in rat (46.5%), suggesting a higher rate of repeat loss in the rat 
lineage than the mouse lineage. 

Functional consequences of these in-frame changes in rat, mouse 
and human were investigated 132 through clustering of proteins 
based on annotation of function and cellular localization 112 , and 
mapping indels onto protein structural and sequence features. The 
rate that indels accumulated in secreted (3.9 X 10~ 4 indels per 
amino acid) and nuclear (4.0 X 10~ 4 ) proteins is approximately 
twice that of cytoplasmic (2.4 X 10~ 4 ) and mitochondrial 
(1.4 X 10~ 4 ) proteins. Likewise, ligand-binding proteins acquire 
indels (3.1 X 10" 4 ) at a higher rate than enzymes (2.1 X 10" 4 ) 132 . 
These trends exactly mirror those observed for amino acid substi- 
tution rates 3 , suggesting tight coupling of selective constraints 
between indels and substitutions. Transcription regulators showed 
the highest rate of indels (4.3 X 10~ 4 ), a finding that may relate to 
the over-representation of homopolymorphic amino acid tracts in 
these proteins 135 . 

Known protein domains exhibited 3. 3 -fold fewer indels than 
expected by chance, again paralleling nucleotide substitution rate 
differences between domains and non-domain sequences 3 , Of 
the protein-sequence and structural categories considered (trans- 
membrane, protein domain, signal peptide, coiled coil and low 
complexity), the transmembrane regions were the most refractory 
to accumulating indels, exhibiting a sixfold reduction compared 
with that expected by chance. Low-complexity regions were 3.1 -fold 
enriched, reflecting their relatively unstructured nature and enrich- 
ment in indel-prone trinucleotide repeats. Mapping of indels onto 
groups of known structures revealed that indels are 21% more likely 
to be tolerated in loop regions than the structural core of the 
protein 132 . 

We observed that indel frequency and amino acid repeat occur- 
rence both correlated positively with the G + C coding sequence 
content of the local sequence environment 132,135 . This may be 
explained in part by the correlation of polymerase slippage-prone 
trinucleotide repeat sequences and G + C content 135 . There is also a 
positive correlation between CpG dinucleotide frequency and cod- 
ing sequence insertions, but not deletions. This effect diminishes 
rapidly with increasing distance from the site of the insertion 132 . 

Transcription-associated substitution strand asymmetry 

A recent study reported a significant strand asymmetry for neutral 
substitutions in transcribed regions 133 . Within introns of nine genes, 
the higher rate of A— *G substitutions over that of T— >C substi- 
tutions, together with a smaller excess of G— *A over C— >T substi- 
tutions, leads to an excess of G4-T over C+A on the coding strand 
(also verified on human chromosome 22). The authors 133 hypoth- 
esized that the asymmetries are a byproduct of transcription- 



Table 4 Strand asymmetry of substitutions in introns of rat genes 


Base frequencies on coding strand* 
(G+T)/(C+A) 


Rat genome 
1.060 




Ratio of purine transitions to pyrimidine transitionst 
Rate(A~ G)/Rate(C~T) 


Rat-mouse 
1.036 


Rat-human 
1.036 


Rate of transitionst 
Rate(A— G)/Rate(T^C) 
Rate(G— A)/Rate(C->T) 


Rat 
1.058 
1.017 


Mouse 
1.091 
1.00 


'Computed from the rat genome. 
tComputed from pairwise alignments. 
:£ Computed from three-way alignments. 
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coupled repair in germline cells. Examining the three-way align- 
ments of rat, mouse and human, we verified that the strand 
asymmetries for neutral substitutions exist in introns across the 
genome (Table 4). 

Under the assumption of independence of sequence positions, 
large sample normal approximations to the binomial distribution 
allow us to test whether the fraction of G+T exceeds 0.5, and 
whether the rate at the numerator exceeds the rate at the denomi- 
nator for each of the ratios in Table 4. With the large amount of data 
provided by pooling introns genome-wide, the tests are all highly 
significant (P values < 10~ 4 ), except for the rate of G— >A 
in mouse, which does not significantly exceed that of C— *T 
(P value = 0.6369). These asymmetries are also seen if the study 
is limited to ancestral repeat sites, excludes ancestral repeat sites, 
excludes CpG dinucleotides, is limited to positions flanked by sites 
that are identical in the aligned sequences (in the case of obser- 
vations 2 and 3 in Table 4), or considers introns of RefSeq genes for 
human or mouse. Thus it appears that strand asymmetry of 
substitution events within transcribed regions of the genome is a 
robust genome-wide phenomenon. 

Conservation of intronic splice signals 

Using 6,352 human-mouse-rat orthologous introns from 976 genes 
(Methods), we examined the dynamics of evolution of consensus 
splice signals in mammalian genes. We found that intron class 137 is 
extremely well conserved; we did not observe any U2 to U12 intron 
conversion, or vice versa, nor within U12 introns did we find any 
switching between the major AT-AC and GT-AG subtypes, 
although such events are documented at larger evolutionary dis- 
tances 137 . In contrast, conversions between canonical GT-AG and 
non-canonical GC-AG subtypes of U2 introns are not uncommon. 
Only —70% of GC-AG introns are conserved between human and 
mouse/rat, and only 90% are conserved between mouse and rat. 
Using human as the outgroup, we detected nine GT to GC conver- 
sions after divergence of mouse and rat (from 6,282 introns that 
were likely to have been GT-AG before human and rodents split), 
and two GC to GT conversions (from 34 GC-AG introns that 
probably predated the human and rodent split). These results give 
some indication of the degree to which mutation from T to C is 
tolerated in donor sites. The GC donor site appears to be better 
tolerated in introns with very strong donor sites, because in these 
introns the proportion of GC donor sites is much higher 

than the 0.7% overall frequency of GC donor sites in U2 introns. 
Although we found a variety of other non-canonical configurations 
in U2 introns, very few are conserved, which suggests that 
most correspond to transient, evolutionarily unstable states, 
pseudogenes, or mis-annotations. 

Gene duplications 

Duplication of genomic segments represents a frequent and robust 
mechanism for generating new genes 138 . Because there were no 
compelling data showing rat-specific genes arising directly from 
non-coding sequences, we examined gene duplications to measure 
their potential contribution to rat-specific biology. A previous study 
showed that gene clusters in mouse without counterparts in human 
are subject to rapid, adaptive evolution 3,139 . We used two methods to 
identify recent gene duplications: methods that directly identified 
paralogous clusters, and methods that analysed genomic segmental 
duplications (see above). 

Using the first approach, we found 784 rat paralogue clusters 
containing 3,089 genes (Methods). This was lower than in mouse 
(910 clusters/3,784 genes), but the difference probably reflects the 
larger number of gene predictions from the mouse assembly. 

To investigate the timing of expansion of these individual 
families, we measured rates of local gene duplication and reten- 
tion within clusters. BLAST is not suited to this 140 ' 141 and so we 
instead calculated the number of synonymous substitutions per 



synonymous site (K s ) between all pairs of homologous genes; 
constructed Kyderived phylogenetic trees; and predicted orthology 
or paralogy gene duplication events automatically from their 
topologies (Supplementary Information). The results showed that 
the neutral substitution rate varies among orthologues by approxi- 
mately twofold (Fig. 10). This is similar to chromosomal variation 
shown previously by a study of mouse and human ancestral 
repeats 3 , Rates of change among ancestral gene duplications 
(those that predate the mouse-rat split) were relatively constant. 
Mouse-specific and rat-specific duplications occurred at similar 
rates, except for those with K s < 0,04, which are reduced in mouse- 
specific duplications (Fig. 10). More data are required to determine 
whether this reduction is a biological effect, as it might be accounted 
for by different protocols for assembling mouse and rat genomes, 
which differentially collapse areas of nearly identical sequence. 

The rat paralogue pairs that probably arose after the rat-mouse 
split (12-24 Myr ago) have K s values of <0.2 (Table 3). We found 
649 iC s < 0.2 gene duplication events in rat, a lower number than is 
found in mouse (755). For both rodents, this represents a likelihood 
of a gene duplicating of between 1.3 X 10~ 3 and 2.6 X 10~ 3 every 
Myr. These are necessarily estimates, because gene deletions, con- 
versions and pseudogene formation are not considered. Interest- 
ingly, the data are consistent with a previous estimate for Drosophila 
genes, but are an order of magnitude lower than an estimate for 
Caenorhabditis elegans genes 140 . 

A subset of clusters have at least three gene duplications with 
K s < 0.2 (Table 5). These are expected to be enriched in genes 
whose duplications persist as a consequence of positive selection. 
The group is dominated by genes involved in adaptive immune 
response and chemosensation 87 . Inspection of the JC s -derived trees 
allowed us to infer the gene numbers in these clusters for the 
common ancestor of rat and mouse (that is, at K s — 0.2), assuming 
no gene deletions or pseudogene generation (Table 5). Immuno- 
globulin, T-cell receptor a-chain, and a 2u -gIobuIin genes appear to 
be duplicating at the fastest rates in the rat genome (Table 5). Since 
divergence with mouse, these rat clusters have increased gene 
content several-fold. This recapitulates previous observations that 
rapidly evolving and duplicating genes are over-represented in 
olfaction and odorant detection, antigen recognition and reproduc- 
tion 142 . 

An examination of duplicated genomic segments showed this 
enrichment for most of the same genes and also elements involved 
in foreign compound detoxification (cytochrome P450 and 
carboxylesterase genes) 87 . Together, these are exciting findings 
because each of these categories can easily be associated with a 
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Figure 1 0 Variation in the frequency of gene duplications during the evolutionary histories 
of the rat and mouse. The sequence of gene duplication events was inferred from 
phylogenetic trees determined from pairwise estimates of genetic divergence under 
neutral selection (K s . Methods). The median K s value for mouse:rat 1:1 orthologues is 
0.19. This value corresponds to the divergence time of mouse and rat lineages. 
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familiar feature of rat-specific biology, and further investigation 
could explain some differences between rats and their evolutionary 
neighbours. 

Conservation of gene regulatory regions 

As the third mammal to be fully sequenced, the rat can add 
significantly to the utility of nucleotide alignments for identifying 
conserved non-coding sequences 143 " 147 . This power increases 
roughly as a function of the total amount of neutral substitution 
represented in the alignment 97,98 , and rat adds about 15% to the 
human-mouse comparison (Fig. 5). Many conserved mammalian 
non-coding sequences are expected to have regulatory function, and 
can be predicted using further analyses based upon these align- 
ments 93 ' 548 - 150 . 

We applied such methods for detecting significantly conserved 
elements 97,151 and scoring regulatory potential 148,152 to the genome- 
wide human-mouse-rat alignments. Typical results show strong 
conservation for a coding exon, as well as for several non- coding 
regions (Fig. 11). For example, the intronic region in Fig. 11 
contains 504 bp that are highly conserved in human, mouse and 
rat. The last 100 bp of this alignment block are identical in all three 
species. Peaks in regulatory potential score are correlated with 
conservation score, and in the highly conserved intronic segment, 
they are higher for the three-way regulatory potential score than for 
the two-way scores using human and just one rodent 152 . These data 
are illustrative, but form the foundation of ongoing efforts to 
identify genome sequences involved in gene regulation. 

Requiring conservation among mammalian genomes greatly 
increases the specificity of predictions of transcription factor bind- 
ing sites. Transcription factor databases such as TRANSFAC 153 
contain known transcription factor binding sites and some knowl- 
edge of their distribution, but simply searching a sequence with 
these motifs provides little discriminatory power. For example, all of 



the 85 known regulatory elements 148 and 151 functional promo- 
ters 154 have TRANSFAC matches, but so do 99% of the 2,049,195 
mammalian ancestral repeats, most representing false-positive pre- 
dictions. The introduction of conservation as a criterion for 
regulatory element identification greatly increases specificity, with 
only a modest cost in sensitivity. If we insist that the TRANSFAC 
matches be present and orthologously aligned in all three species — 
human, mouse and rat— then only 268 matches are recorded in 
ancestral repeats (0.01%), while 63 (74%) of the above matches in 
known regulatory elements and 121 (80%) in functional promoters 
are retained. Overall, using a set of 164 weight matrices for 109 
transcription factors extracted from TRANSFAC 153 , we find 
186,792,933 matches in the April 2003 reference human genome 
sequence, but this was reduced to only 4,188,229 by demanding 
conservation in the human-mouse-rat three-way alignments. This 
is a 44-fold increase in specificity. 

We examined one region in more detail: a complex ris-regulatory 
region consisting of a 4,000 bp segment containing two regulatory 
modules, hypersensitive sites 2 and 3 from the locus control region 
of the HBB complex 155-157 . Considerable experimental work has 
identified six functional binding sites for the transcription factor 
GATA-1 in this segment. Requiring that matches to GATA- 1 binding 
sites be conserved in all three species and occur within regions of 
strong regulatory potential is sufficient to find these six functional 
binding sites, and only these six, in the 4,000 bp segment. Thus, in 
this example we observed complete sensitivity and specificity by 
requiring this level of conservation. 

Pseudogenes and gene loss 

To complement the identification and analysis of protein-coding 
regions, we sought to examine rat pseudogenes. Using a previously 
described method 158,159 , we found 18,755 pseudogenes in intergenic 
regions. Pseudogenes are normally not subjected to selective con- 



Table 5 Recent gene duplications (K s < 0-2) in the rat lineage 



Cluster 
ID 


Recent duplication 
events 


Numbers of genes 
involved 


Extant cluster 
size 


Ancestral cluster 
size 


Chromosome 


Annotation 


Process 


249 


38 


53 


60 


22 


4 


Immunoglobulin K-chain V 


Immunity 


640 


38 


47 


53 


15 


15 


TCRci-chainV 


Immunity 


346 


25 


35 


44 


15 


6 


Immunoglobulin heavy chain V 


Immunity 


190 


22 


42 


168 


146 


3 


Olfactory receptor 


Chemosensation 


578 


16 


28 


59 


43 


13 


Olfactory receptor 


Chemosensation 


400 


15 


26 


82 


67 


8 


Olfactory receptor 


Chemosensation 


743 


15 


21 


37 


22 


20 


Olfactory receptor 


Chemosensation 


72 


12 


22 


102 


90 


1 


Olfactory receptor 


Chemosensation 


500 


12 


18 


32 


20 


10 


Olfactory receptor 


Chemosensation 


51 


6 


7 


16 


10 


1 


Glandular kallikrein 


Reproduction? 


256 


6 


8 


10 


4 


4 


Vomeronasal receptor V1 R 


Chemosensation 


488 


6 


10 


11 


5 


10 


Olfactory receptor 


Chemosensation 


644 


6 


10 


14 


8 


15 


Granzyme serine protease 


Immunity 


4 


5 


6 


9 


4 


1 


Trace amine receptor, GPCR 


Neuropeptide receptors? 


248 


5 


9 


15 


10 


4 


Vomeronasal receptor V1 R 


Chemosensation 


393 


5 


10 


31 


26 


8 


Olfactory receptor 


Chemosensation 


522 


5 


8 


19 


14 


10 


Keratin -associated protein 


Epithelial cell function 


550 


5 


8 


17 


12 


11 


Olfactory receptor 


Chemosensation 


635 


5 


9 


20 


15 


15 


Olfactory receptor 


Chemosensation 


79 


4 


8 


38 


34 


1 


Olfactory receptor 


Chemosensation 


88 


4 


6 


11 


7 


1 


Olfactory receptor 


Chemosensation 


109 


4 


7 


43 


39 


1 


Olfactory receptor 


Chemosensation 


294 


4 


5 


5 


1 


5 


a 2u -globulin 


Chemosensation 


310 


4 


5 


11 


7 


5 


Olfactory receptor 


Chemosensation 


353 


4 


7 


13 


9 


7 


Olfactory receptor 


Chemosensation 


399 


4 


5 


6 


2 


8 


Ly6-like urinary protein 


Chemosensation? 


638 


4 


6 


6 


2 


15 


RNase A 


Immunity 


690 


4 


6 


21 


17 


17 


Prolactin paralogue 


Reproduction 


239 


3 


6 


6 


3 


4 


Prolactin-induced protein 


Reproduction 


253 


3 


4 


5 


2 


4 


Camello-like N-acetyltransf erase 


Developmental regulator 


274 


3 


6 


20 


17 


4 


Ly-49 lectin natural killer cell protein 


Immunity 


297 


3 


4 


5 


2 


5 


Interferon-a 


Immunity 


523 


3 


4 


6 


3 


10 


Keratin-associated protein 


Epithelial cell function 


746 


3 


5 


6 


3 


20 


MHC class 1b (M 10) 


Chemosensation 



Duplications involving retroviral genes, fragmented genes with internal repeats, and likely pseudogene clusters were removed from this list. Only gene clusters exhibiting at least three duplications are 
shown. 
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straint and therefore accumulate sequence modifications neutrally. 
Indeed, nearly all of our identified pseudogenes (97 ± 3%) evolved 
under neutrality according to a K A IK S test, and therefore are 
consistent with being pseudogenic. 

We classified these pseudogenes according to whether they arose 
from retrotransposition, in which case they integrated into the 
genome randomly, or whether they arose from tandem duplication 
and neutral sequence substitution. Using human-rat synteny, we 
found that 80% of pseudogenes exhibited no significant similarity 
to the corresponding human orthologous region, and therefore 
were considered retrotransposed, processed pseudogenes. The total 
pseudogene count, and processed pseudogene proportion, are 
consistent with those found for human 158,159 . These numbers are 
greater than those previously reported for mouse 3,4 . However, 
reanalysis using the method employed here detects a similar 
pseudogene number (20,000) to that found for human and rat. 
This suggests that the rate of pseudogene creation is similar among 
these mammals. 

As with the human genome 159,160 , the largest group of rat 
pseudogenes (totalling 2,188), according to InterPro 161 , consists of 
ribosomal protein genes. Other large rat pseudogene families arose 
from olfactory receptors (552, see below), glyceraldehyde-3 -phos- 
phate dehydrogenase (GAPDH) (251), protein kinases (177), and 
RNA binding RNP-1 proteins (174). Pseudogenes homologous to a 
meiotic spindle-associated protein— spindlin 162 — are particularly 
numerous in rat (at least 53 copies) compared with mouse 
(approximately three copies). This suggests that spindlin pseudo- 
genes may have distributed rapidly by a recently active transposable 
element. 

We investigated the much-studied metabolic enzyme 
GAPDH 3,163 , and observed that: (1) the GAPDS gene arose from a 
duplication of the GAPDH gene; (2) biogenesis of the GAPDH 
pseudogenes has been occurring steadily over time both before and 



after rodent-human and mouse-rat divergence; and (3) the GAPDS 
gene has undergone little retrotransposition in all three genomes 
compared with its relative, the GAPDH gene (consistent with 
respective gene-expression levels in the germ line). 

In situ loss of rat genes 

As an organism evolves, its need for certain genes may be reduced, 
or lost, owing to changes in its ecological niche, Loss of selective 
constraints leads to accumulation of nonsense and/or frameshift 
mutations without retrotransposition or duplication. These non- 
processed pseudogenes are interesting because they link environ- 
mental changes to genomic mutation events. However, predicted 
pseudogenes with disrupted reading frames might also be indicative 
of errors in genome sequence or assembly. By constraining the 
search to orthologous genomic regions, we identified 14 rat putative 
non-processed pseudogenes (Table 6) with apparently functional, 
single human and mouse orthologues. Half of these contain one in- 
frame stop or frameshift, whereas the remainder contain more. We 
expect this number of identified pseudogenic orthologues to be 
conservative because the methods employed required high fidelity 
of both gene prediction and orthologue identification in all three 
species (Methods). 

Nevertheless, as only 14 recently evolved pseudogene candidates 
were identified, this indicates that the genome sequence and 
assembly (Rnor3,l) is of high quality. The improved quality of 
the most recent assembly is underscored by 1 1 additional candidate 
pseudogenes, predicted from rat assembly Rnor2.1, that are appar- 
ently functional, full-length genes in Rnor3.1. Consequently, some 
of the current 14 candidates, in particular those that are involved in 
fundamental processes of eukaryotic biology, may yet be 'repaired' 
by sequence changes in future assemblies, and thus be recognized as 
genie. However, genes associated with innate immunity (which is 
particularly susceptible to change via adaptive evolution), such as 
Forssman glycolipid synthetase and complement factor I, may yet be 
found to survive as true pseudogenes in the rat. 



10,205,500 



Sequence position (human) 
10,206,000 10,206,500 10,207,000 10,207,500 




— Regulatory potential 
(human-mouse-rat) 

— Regulatory potential 
(human-rat) 



— Regulatory potential 
(human-mouse) 

— Sequence conservation 
(human-mouse-rat) 



Figure 11 Close-up of PEX14 (peroxisomal membrane protein) locus on human 
chromosome 1 (with homologous mouse chromosome 4 and rat chromosome 5). 
Conservation score computed on three-way human-mouse-rat alignments (parsimony 
Pvalues 151 ) presents a clear coding exon peak (grey bar) and very high values in a 504 bp 
non-coding, intronic segment {right; last 100 bp of alignment are identical in all three 
organisms). The latter segment showed a striking difference between the inferred mouse 
and rat branch lengths 110,111 ' 222 : the grey bracket corresponds to a phylogenetic tree 
where the logarithm of mouse to rat branch-length ratio is -6. Regulatory potential 
scores 1 " 9,152 that discriminate between conserved regulatory elements and neutrally 
evolving DNA are calculated from three-way (human-mouse-rat) and two-way (human- 
rodent) alignments. Here the three-way regulatory potential scores are enhanced over the 
two-way scores. 



Non-coding RNA genes 

We investigated the abundance and distribution of non-coding 
(nc)RNAs in rat. Cytoplasmic transfer (t)RNA gene identification 
in rodents is complicated by tRNA-derived identifier (ID) short 
interspersed nucleotide (SINEs) (B2 and ID). tRNAscan-SE pre- 
dicted 175,943 tRNAs (genes and pseudogenes); however, the 
majority (175,285) were SINEs identified by RepeatMasker. This 
is far greater than the number found in mouse (24,402/25,078) or 
human (25/636). Of the remaining 666 predictions, 163 were 
annotated as tRNA pseudogenes and four were annotated as 
undetermined by tRNAscan-SE. An additional 68 predictions 
were removed because their best database match in either human, 
mouse or rat tRNA databases matched tRNAs with either a different 
amino acid or anticodon (violating the wobble rules that specify the 
distinct anticodons expected). The total of 431 tRNAs (including a 
single selenocysteine tRNA) identified in the rat genome is com- 
parable to that for mouse — 435 tRNAs (version mm2 from the 
UCSC genome browser)— and human— 492 tRNAs (from the geno- 
mic tRNA database, http://rna.wustl.edu/GtRDB/Hs/). These three 
species share a core set of approximately 300 tRNAs, using a cutoff 
of 5:95% sequence identity and ^95% sequence length. 

A total of 454 ncRNAs (other than tRNAs) were identified by 
sequence comparison to known ncRNAs (Supplementary Infor- 
mation). These include 113 micro- (mi)RNAs, five ribosomal 
RNAs, 287 small nucleolar (sno)RNAs and small nuclear (sn)RNAs, 
49 various other ncRNAs such as signal recognition particle (SRP) 
RNA, 7SK RNA, telomerase RNA, RNase P RNA, brain-specific 
repetitive (bsr)RNA, non-coding transcript abundantly expressed 
in brain (ntab)RNA, small cytoplasmic (sc)RNA and 626 pseudo- 
genes. Complete 18S and 28S rRNA genes and more rRNAs were not 
identified, presumably owing to assembly issues. 
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Table 6 Candidate rat pseudogenes, orthologous to mouse and human functional genes 



Mouse gene 


Human gene 


Strand 


Rat genome coordinates* 


Frames hifts/stopst 


Annotation 


ENSMUSG00000013611 


ENSG000001 74226 




7:92752590-92807556 


1/0 


Sorting nexin 


ENSMUSG00000024364 


ENSG000001 58402 


+ 


18:62742414-62770427 


2/0 


Dual-specificity phosphatase CDC25c 


ENSMUSG00000026293 


ENSG00000077044 


+ 


9:95634847-95692601 


1/0 


Diacy [glycerol kinase 6 


ENSMUSG00000026785 


ENSG000001 60447 


+ 


3:9210762-9229984 


5/0 


Protein kinase PKN0 


ENSMUSG00000026829 


ENSG000001 48288 


+ 


3:7662414-7664521 


2/2 


Forssman glycolipid synthetase 


ENSMUSG00000027426 


ENSG000001 25846 


+ 


3:125918806-125924149 


1/1 


Zinc finger protein 1 33 


ENSMUSG00000028000 


ENSG000001 38799 




2:221272797-221304350 


1/0 


Complement factor I 


ENSMUSG00000029203 


ENSG00000078140 




14:44385206-44441888 


1/0 


Ubiquitin-protein ligase E2 (HIP2) 


ENSMUSG00000030270 


ENSG000001 44550 




20:8332585-8362331 


3/0 


Copine (membrane trafficking) 


ENSMUSG00000035449 


ENSG000001 67646 


+ 


1:67374986-67381472 


1/0 


Cardiac troponin I 


ENSMUSG00000037029 


ENSG00000105261 




1 :82728049-n82730272 


1/0 


Zinc finger protein 146 


ENSMUSG00000037432 


ENSG00000158142 


+ 


9:42465695-42498651 


1/1 


Dysferlin-like protein 


ENSMUSG00000039660 


ENSG00000167137 




3:9320401-9326997 


4/0 


Similar to yeast YMR31 0c RNA-binding protein 


ENSMUSG00000042653 


ENSG000001 37634 


+ 


8:49938446-49939091 


1/0 


Brush border 61 .9 kDa-like protein 



'Coordinates from rat v2.0. 

t Mouse genes were used as templates for predicting rat pseudogenes. 



Evolution of transposable elements 

Most interspersed repeats are immobilized copies of transposable 
elements that have accrued substitutions in proportion to their time 
spent fixed in the genome (for introduction 2,3,164 " 167 ). About 40% of 
the rat genome draft is identified as interspersed repetitive DNA 
derived from transposable elements, similar to that for the mouse 3 
(Table 7) and lower than for the human (almost 50% 2 ). The latter 
difference is mainly due to the lower substitution rate in the human 
lineage, which allows us to recognize much older (Mesozoic) 
sequences as interspersed repeats. Almost all repeats are derived 
from retroposons, elements that procreate via reverse transcription 
of their transcripts. As in mouse, there is no evidence for activity of 
DNA transposons since the rat-mouse split. Many aspects of the rat 
and the mouse genomes' repeat structure are shared; here we focus 
on the differences. 

L1NE-1 activity in the rat lineage 

The long interspersed nucleotide element (LINE)-l (LI) is an 
autonomous retroelement, containing an internal RNA polymerase 
II promoter and two open reading frames (ORFs). The ORF1 
product is an RNA binding protein with chaperone-like activity, 
suggesting a role in mediating nucleic acid strand transfer steps 



during LI reverse transcription 168 , whereas ORF2 encodes a protein 
with both reverse transcriptase and DNA endonuclease activity. 
LINEs are characteristically 5' truncated so that only a small subset 
extends to include the promoter region and can function as a source 
for more copies. 

Many classes of LINE-like elements exist, but only LI has been 
active in rodents. Over half a million copies, in variable stages of 
decay, comprise 22% of the rat genome. Although 10% of the 
human genome is comprised of LI copies introduced before the 
rodent-primate split, owing to the fast substitution rate in the 
rodent lineage only 2% of the rat genome could be recognized as 
such. Thus, probably well over one-quarter of all rat DNA is derived 
directly from the LI gene. 

Following the mouse-rat split, LI activity appears to have 
increased in rat. The 3' UTR sequences defined six rat-specific LI 
subfamilies, represented by 150,000 copies that cover 12% of the rat 
genome. LI copies accumulated over the same period in mouse 
cover only 10% of the genome (Table 7). This higher accumulation 
of LI copies could explain some of the size difference of the rat and 
mouse genome. 

In addition to the traditional LI elements, there are 7,500 copies 



Table 7 Composition of interspersed repeats In the rat genome 

Rat Mouse 





Copies (x 10 3 ) 


Total length (Mb) 


Fraction of genome (%) 


Lineage-specific (%) 


Fraction of genome (%) 


Lineage-specific (%) 


LINEs 


657 


594.0 


23.11 


11.70 


20.10 


9.74 


LINE-1 


597 


584.2 


22.73 


11.70 


19.65 


9.74 


LINE-2 


48 


8.4 


0.33 




0.38 




L3/CR1 


11 


1.4 


0.06 




0.06 




SINEs 


1,360 


181.3 


7.05 


1.52 


7.78 


1.80 


B1(Alu) 


384 


42.3 


1.65 


0.16 


2.53 


0.92 


B4(ID_B1) 


359 


55.4 


2.15 


0.00 


2.25 


0.00 


ID 


225 


19.6 


0.76 


0.54 


0.20 


0.00 


B2 


328 


55.2 


2.15 


0.68 


2.29 


0.74 


MIR 


109 


13.0 


0.51 




0.56 




LTR elements 


556 


232.4 


9.04 


1.84 


10.28 


2.85 


ERV.class ! 


40 


24.9 


0.97 


0.56 


0.79 


0.36 


ERV.class II 


141 


83.4 


3.24 


1.02 


4.13 


1.73 


ERVL (111) 


74 


21.6 


0.84 


0.04 


1.08 


0.23 


MaLRs 


302 


102.5 


3.99 


0.22 


4.27 


0.53 


DNA elements 


108 


20.9 


0.81 




0.86 




Charlie(hAT) 


80 


14.8 


0.58 




0.60 




TiggerfTd) 


18 


4.0 


0.16 




0.17 




Unclassified 


14 


7.3 


0.28 




0.37 




Total 


2,690 


1.036 


40.31 


14.90 


39.45 


14.26 


Small RNAs 


8 


0.6 


0.03 


0.01 


0.03 


0.01 


Satellites 


14 


6.4 


0.25 


? 


0.31 


? 


Simple repeats 


897 


61.1 


2.38 


? 


2.41 


? 



Data for Rnor3.1 and October 2003 mouse (MM4), excluding Y chromosome, using the 1 7 December 2003 version of RepeatMasker. To highlight the differences between rat and mouse repeat content, 
columns 5 and 7 show the fractions of the genomes comprising lineage-specific repeats, The LINE-1 numbers include all HAL1 copies, whereas all BC1 scRNA and > 1 0% diverged tRNA-Ala matches, far 
more common than other small RNA pseudogenes and closely related to ID, have been counted as ID matches. 
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(10 Mb) of a non-autonomous element that is derived from LI by 
deletion of most of its ORF2. A similar element, active in Mesozoic 
times, has been called HAL1 (for Half-a-LINE) 164 . Given their low 
divergence, we conclude that the currently identified HALl-like 
elements operated only a few million years ago in the mouse lineage 
(MusHALl) and still propagate in the rat genome (RNHAL1). 
RNHAL1 contains only an ORF1, whereas MusHALl encoded an 
endonuclease as well, although no reverse transcriptase. The 5' 
2,600 bases of RNHAL1 are 98% identical to the currently active LI 
in rat (LlJRn or Llmlvi2 169 ). Unlike ancient HAL1 elements, which 
shared the 3' UTR with a contemporary LI, the 3' end of RNHAL1 
is unrelated to other repeats. The repeated origin and high copy 
number of HAL Is suggest that the ORF1 product, which binds 
strongly to its messenger RNA 168 , may render this transcript a 
superior target for LI -mediated reverse transcription. In this way 
HAL1 resembles the non-autonomous, endogenous retrovirus- 
derived MaLR elements (below), which, for over 100 million 
years, retained only the retroviral gag ORF that encodes an RNA 
binding protein. A potential advantage of HAL1 over LI is its 
shorter length, which, considering the usual 5' truncation of copies, 
increases the chance that a copy may include the internal promoter 
elements and become a source gene. 



Different activity of SINEs in the rat and mouse lineage 

The most successful usurpers of the LI retrotransposition machin- 
ery, however, are SINEs. These are small RNA-derived sequences 
with an internal RNA polymerase III promoter. Recently, the human 
Alu SINE has been experimentally proven to be transposed by LI 170 . 
Most SINEs share the 3' end with their associated LINE elements, 
like the Mesozoic mammalian LINE- 2 (L2) and MIR pair, increas- 
ing the efficiency with which a LINE reverse transcriptase recognizes 
the 3' end of a dependent SINE. However, LI does not show 
sequence specificity and rodent and primate SINE sequences are 
unrelated to LI. Although any transcript can be retroposed, as can 
be seen from the numerous processed pseudogenes in mammalian 
genomes, LI -dependent SINEs probably have features that make 
them especially efficient targets of the LI reverse transcriptase. 

Although before the radiation of most mammalian orders LI was 
at least as active as L2, the L2-dependent MIR was the only known 
(and very abundant) SINE of that time. All of the currently active 
SINEs in different mammalian orders appear to have arisen after the 
demise of L2 (and consequently MIR), as though an opportunity 
(or necessity) arose for the creation and expansion of other SINEs. 

Four different SINEs are distinguished in rat and mouse. The Bl 
element seems to share its origin from a 7SL RNA gene with the 
primate Alu 171 . This probably happened just before the rodent- 
primate split and after the speciation from most other eutherians, 
where Alu/Bl elements are not known. The other SINEs are rodent- 
specific and have tRNA-like internal promoter regions. ID elements 
consist only of this tRNA-like region, which in older ID copies 
closely match an Ala-tRNA from which it may have been derived. B4 
resembles a fusion of an ID and Bl SINE. Finally, B2 has a tRNA-like 
region of unknown affiliation followed by a unique 120 bp region. 

The fortunes of these SINEs during mouse and rat evolution have 
been different (Fig. 12). B4 probably became extinct before the 
mouse-rat speciation, while B2 has remained productive in both 
lineages, scattering > 100,000 copies in each genome after this time. 
Interestingly, the fate of the Bl and ID SINEs has been opposite in 
rat and mouse. While Bl is still active in mouse, having left over 
200,000 mouse-specific copies in its trail, the youngest of the 40,000 
rat-specific Bl copies are 6-7% diverged from their source, indi- 
cating a relatively early extinction in the rat lineage. On the other 
hand, after the mouse-rat split only a few hundred ID copies may 
have inserted in mouse, whereas this previously minor SINE 
(~ 60,000 copies predate the speciation) increased its activity in 
rat to produce 160,000 ID copies. 
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Co-localization of SINEs in rat and mouse 

Despite the different fates of SINE families, the number of SINEs 
inserted after speciation in each lineage is remarkably similar: 
—300,000 copies. Reminiscent of the replacement of MIR by LI 
driven SINEs, it seems that the demise of Bl in rat allowed the 
expansion of IDs. Moreover, these independently inserted and 
unrelated SINEs (ID and Bl share only a mechanism of retro- 
position) accumulated at orthologous sites: the density of rat- 
specific SINEs in 14,243 ~ lOOkb windows in the rat genome is 
highly correlated (R 2 = 0.83) with the density of mouse-specific 
SINEs in orthologous regions in mouse. To avoid including 
elements fixed before the speciation, only SINEs labelled lineage- 
specific on the basis of subfamily assignment (Methods 89 ) were 
tallied with a divergence from the consensus that was well below the 
9% average for neutral sites (Fig. 5). These data corroborate and 
refine the observation of a strong correlation between the location 
of primate- and rodent-specific SINEs in 1 Mb windows 3 . At 100 kb, 
no correlation is seen for interspersed repeats other than SINEs. 

Insertions of SINEs at the same location in different species have 
been reported 172 " 174 , and the correlation could reflect the existence 
of conserved hotspots for SINE insertions. However, only five of 
—800 human specific Alu elements have an Alu inserted within 
1 00-200 bp in any of six other primate lineages 174-176 . Likewise, gene 
conversions of shared Alus into lineage-specific copies were 
observed five times in the same set, too low a level to contribute 
significantly to the observed correlation 174-176 . 

Figure 9c displays the lineage-specific SINE densities on rat 
chromosome 10 and in the mouse orthologous blocks, showing a 
stronger correlation than any other feature. The cause of the 
unusual distribution patterns of SINEs, accumulating in gene-rich 
regions where other interspersed repeats are scarce, is apparently a 
conserved feature, independent of the primary sequence of the SINE 
and effective over regions smaller than isochores. 

In the human genome, the most recent (unfixed) Alus are 
distributed similarly to LI, whereas older copies gradually take on 
the opposite distribution of SINEs 2,164 . This suggested that SINEs 
insert in the same places as LINEs, and that the typical SINE pattern 
is due to selection (or deletion bias) rather than a mechanistic 
insertion bias shared by all (unrelated) SINEs, but not by LINEs that 
use the same insertion process. This led to a proposal that SINEs are 
preferentially maintained in regions where they can easily be 
expressed 2,164 : if so, this could be the local feature conserved between 
mammalian genomes that leads to the strong correlation of local 
SINE densities in different mammals. However, we did not observe 
this temporal shift in SINE distribution pattern in mouse, nor 
currently in the rat genome, despite a considerable effort to define 
the potentially unfixed SINEs in both species (see ref. 89 for details). 
The observations in human could reflect a recent change in Alu 
behaviour, which would necessitate another explanation for the 
contrary insertion-preference of older Alus and all other SINEs. 

Some regions of high LINE content coincide with regions that 
exhibit both higher AT content and an increased rate of point 
substitution (Fig. 9, pink rectangles), In a genome-wide analysis, 
LINE content correlates strongly with substitution rates, and about 
80% of this correlation is explained by higher rates in AT- rich 
regions 89 . SINE density shows the opposite correlation both on 
chromosome 10 (Fig. 9) and genome-wide 89 . 

These phenomena, in conjunction with an overall trend in 
substitution rates towards AT-richness, suggest a model in which 
quickly evolving regions accumulate a higher-than-average AT 
content, which attracts LINE elements. Although distinct cause- 
effect relationships such as this remain largely speculative, these 
results reinforce the idea that local genomic context strongly shapes 
local genomic features and rates of evolution. 

Endogenous retroviruses and derivatives 

The other major contributors to interspersed repeats in the rodent 
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genome are retrovirus-like elements. These have several 100 bp long 
terminal repeats (LTRs) with transcriptional regulatory sequences 
that flank an internal sequence that, in autonomous elements, 
encodes all proteins necessary for retrotransposition. All mamma- 
lian LTR elements are endogenous retroviruses (ERVs) or their non- 
autonomous derivatives. They fall into three groups, of which 
representatives in mouse are: murine leukaemia virus (MuLV) 
(class I), intracisternal A-particle (IAP) and MMTV (class II), and 
MERVL (class III). 

The most productive retrovirus in mammals has been the class III 
element ERV-L, primarily through its ancient non-autonomous 
derivatives, called MaLRs, with 350,000 copies occupying ~5% of 
the rat genome (Table 7). Human ERV-L and MaLR copies are >6% 
diverged from their reconstructed source genes and must have died 
out around the time of human speciation from New World 
monkeys. In mouse, several thousand almost identical MaLR and 
ERV-L copies suggest sustained activity 177 " 179 . In contrast, rat ERV-L 
activity must have been silenced a few million years ago, given that 
the least diverged MaLR and ERV-L (MTB_Rn and MT2_Ratl) 
copies differ by >4% from each other. Other class III ERVs were 
active earlier in rodent evolution, before the mouse-rat speciation. 

In contrast to class III ERVs, class I and class II elements still 
thrive in rat. We reconstructed four rat-specific autonomous class I 
ERVs, of which two appear still active, and nine class II ERVs, of 
which four may still be active. The non-autonomous NICER and 
RAL elements represent over 60% of all rat-specific class I elements. 
The autonomous drivers of this group, RNNICER2 and 3, with 
several intact copies, are closely related to the mouse-specific MuLV. 
Among the potentially active autonomous class II ERVs are 
MYSERV_Rn, related to the Mys element in Peromyscus, and several 
IAP elements, one with a full-length envelope gene. The most 
prolific, still-active class II ERV, RNERVK3, is distantly related to 
the simian retroviruses and, like ERV-L and NICER, has spawned 
abundant non-autonomous elements characterized by closely 
related LTRs. 

Simple repeats 

Whereas the above interspersed repeats derive from transposed 
sequences, mammalian genomes also contain interspersed simple 
sequence repeats (SSRs), regions of tandemly repeated short 
(1-6 bp) units that probably arise from slippage during DNA 
replication and can expand and compress by unequal crossing 




Figure 12 Historical view of rodent repeated sequences. Relationships of the major 
families of interspersed repeats (Table 7) are shown for the rat and mouse genomes, 
indicating losses and gains of repeat families after speciation. The lines indicate activity as 
a function of time. Note that HAL1 -like elements appear to have arisen in both the mouse 
and rat lineages. 



over. Remarkable differences were noted between the SSR contents 
of the human and mouse genomes 3 . Three times as many base pairs 
are contained in near (>90%) perfect SSRs in mouse than in 
human, and a 4-5-fold excess was revealed when excluding SSRs 
contained in or seeded by interspersed repeats (primarily SSRs 
derived from the poly A or simple repeat tails of SINEs and LINEs), 
SSRs are both more frequent and on average longer in mouse. 
Polypurine (or polypyrimidine) repeats are especially (tenfold) 
over-represented in the mouse genome, As discussed above, this 
contrasts sharply with the greater frequency of triplet repeats coding 
for amino acids in human than in the rodents. 

Rat and mouse SSR contents show, perhaps not surprisingly, 
much smaller differences. They represent almost the same amount 
of the rat and mouse genomes (for >90% perfect elements, —1.4% 
compared with 0.45% in human) and are of similar average length; 
for example, the average >90% perfect (CA)„ repeat, the most 
common SSR in mammals, is 42 bp long in mouse and 44 bp in rat. 
Some potentially significant differences are that polypurine SSRs are 
of similar average length but are 1.2-fold more common in mouse, 
whereas the rare SSRs containing CG dimers are 1.5-fold more 
frequently observed in rat. 




Figure 13 Adaptive remodelling of genomes and genes, a, Orthologous regions of rat, 
human and mouse genomes encoding pheromone-carrier proteins of the lipocalin family 
(a 2u -globulins in rat and major urinary proteins in mouse) shown in brown. Zfp37-like zinc 
finger genes are shown in blue. Filled arrows represent likely genes, whereas striped 
arrows represent likely pseudogenes. Gene expansions are bracketed. Arrowhead 
orientation represents transcriptional direction. Flanking genes 1 and 2 are TSCOTand 
CTR1, respectively, b, Site-specific KpJK$ analysis of rat a 2 u-globulins. Shown in red are 
side-chains from codons subject to positive selection. These have been mapped to a 
ribbon representation of the crystal structure of rat a 2u -globulin chain A. 
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Figure 14 Evolution of cytochrome P450 (CYP) protein families in rat, mouse and human, 
a, Dendrogram topology from 234 full-length sequences. 279 sequences of ^300 amino 
acids; subfamily names and chromosome numbers are shown. Black branches have 
>70% bootstrap support. Incomplete sequences (they contain Ns) are included in counts 
of functional genes (84 rat, 87 mouse and 57 human) and pseudogenes (including 
fragments not shown; 77 rat, 121 mouse and 52 human). 64 rat genes and 12 
pseudogenes were in predicted gene sets. Human CYP4F is a null allele owing to an 
in-frame STOP codon in the genome, although a full-length translation exists (SwissProt 
P981 87). Rat CYP27B, missing in the genome, is 'incomplete' because there is a RefSeq 
entry (NP_44621 5). Grouped subfamilies CYP2A, 2B, 2F, 2G, 21 and CYP4A, 4B, 4X, 4Z, 
occur in gene clusters; thus nine loci contain multiple functional genes in a species. One 
(CYP1A) has fewer rat genes than human, seven have more rodent than human, and all 
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nine differ in rodent copy numbers. CYP2AC is a rat-specific subfamily (orthologues are 
pseudogenes). CYP27C has no rodent counterpart. Rodent-specific expansion, rat CYP2J, 
is illustrated below, b, The neighbour-joining tree 224 , with the single human gene, 
contains clear mouse (Mm) and rat (Rn) orthologous pairs (bootstrap values >700/1 ,000 
trials shown). Bar indicates 0.1 substitutions per site, c, All rat genes have a single mouse 
counterpart except for CYP2J 3, which has further expanded in mouse (mouse CYP2J 3a, 
3b and 3c) by two consecutive single duplications. The genes flanking the CYP2J 
orthologous regions (rat chromosome 5, 126.9-1 27.3 Mb; mouse chromosome 4, 
94.0-94.6 Mb; human chromosome 1, 54.7-54.8 Mb) are hookl (H00K1; pink) and 
nuclear factor l/A (NFIA; cyan). Genes (solid) and gene fragments (dashed boxes) are 
shown above (forward strand) and below (reverse strand) the horizontal line. No orthoiogy 
relation could be concluded for most of these cases. 
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Figure 15 Comparative analysis of rat, mouse and human proteases. The complete non- coloured according to its presence or absence in rat, mouse and human as indicated in 
redundant set of proteases and protease homologues from each species is distributed in the inset, 
five catalytic classes and 67 families. Each square represents a single protease, and is 
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Prevalent, medium-length duplications in rodents 

In addition to the transpositionally derived interspersed repeats and 
simple repeats detected by RepeatMasker and Tandem Repeat 
Finder, the rat and mouse genomes contain a substantial amount 
of medium-length unclassified duplications (typically 100- 
5,000 bp). These are readily seen in self-comparisons and in intra- 
rodent comparisons after masking the known repeats, but they are 
substantially less prevalent in comparisons with the human genome 
(Supplementary Information). Clearly, a substantial fraction of the 
rodent genomes consists of currently unexplained repeats and a full 
characterization awaits further studies. The unclassified dupli- 
cations may include: ( 1 ) novel families of low-copy rodent inter- 
spersed repeats; (2) extensions of known but not fully characterized 
rodent repeats; and (3) duplications generated by a mechanism 
different from transposition. 

Rat-specific biology 

A principal ambition of the RGSP was to reveal genetic differences 
between rats and mice that might specify their differences in 
physiology and behaviour. This view was well supported by the 
current draft sequence and predicted gene set. In particular, recently 
duplicated genes are enriched in elements involved in chemosensa- 
tion and functional aspects of reproduction (Table 5). Here we 
illustrate the differences in the gene complements of rat and mouse 
by in-depth analyses of olfactory receptors (ORs), pheromones, 
cytochromes P450, proteases and protease inhibitors. 

Chemosensation 

The ability to emit and sense specific smells is a key feature of 
survival for most animals in the wild. Another paper 180 describes the 
evolution of rat and mouse pheromones, vomeronasal receptors, 
and ORs whose genes were duplicated frequently during the time 
since the common ancestor of rats and mice (Table 5). Their study 
yielded over 200 aligned codons predicted to have been subject to 
adaptive evolution. They attribute the rapid evolution of these genes 
to conspecific competition — in particular, sexual selection. 

Using a homology-based identification procedure with manual 
curation 181 , we found 1,866 ORs in 1 13 locations in the rat genome: 
69 multi-gene clusters and 44 single genes. After adjusting for 
missing sequences (the assembly covers 90.2% of the genome), we 
extrapolate that there are —2,070 OR genes and pseudogenes. The 
rat therefore has —37% more OR genes and pseudogenes than the 
~1,510 ORs of the mouse 181,182 , assuming similar representation of 
recently duplicated sequences in the two genome assemblies used. 
Of the 1,774 OR sequences that are not interrupted by assembly 
gaps, 1,227 (69%) encode intact proteins, while the remaining 547 
(31%) sequences are probably pseudogenes with in-frame stop 
codons, frameshifts, and/or interspersed repeat elements. Fewer 
mouse OR homologues are pseudogenes (—20%) 181 ,182 , but the 
larger family size in rat still leaves it with substantially more intact 
ORs than the mouse (—1,430 versus —1,210). Striking rat-specific 
expansions of two ancestral clusters account for much of the 
difference in OR family size and pseudogene content between rat 
and mouse, although many other clusters exhibit more subtle 
changes (not shown). Significant differences between human and 
mouse OR families have also been reported 181-183 , but the functional 
implications of OR repertoire size on the ability of different species 
to detect and discriminate odorants are not yet known. 

a 2 u-globulin pheromones 

The ot 2u -globulin genes are odorant-binding proteins that also 
contribute to essential survival functions in animals. a 2u -globulin 
homologues are likely to be highly heterogeneous among murid 
species. Several homologues (major urinary proteins) sequenced 
from the BALB/c mouse are distinct from their C57BL/6J mouse 
counterparts, and these also appear to be arranged differently along 
its genome 184 . Moreover, two full-length genes from other mouse 
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strains 185 differ from their C57BL/6J orthologues— either lacking 
two of the bases or retaining 20 of the bases that render the 
C57BL/6J sequences likely to be pseudogenes (not shown). 

The evolution of a 2u -globulin genes on rat chromosome 5 has 
clearly driven a significant 'remodelling' of this genomic region 
(Fig. 13a). The orthologous human genomic region contains a 
single homologue, suggesting that the common ancestor of rodents 
and human possessed one gene. The genome of C57BL/6J mice 
contains four homologous genes, and seven pseudogenes, whereas 
the rat genome contains ten a 2u -globulin genes and 12 pseudogenes 
in a single region (Fig. 13a). 

Phylogenetic trees constructed using amino acid, and non-coding 
DNA, sequences show that, surprisingly, the rat a 2u -g!obulin gene 
clusters appear to have arisen recently via a rapid burst of gene 
duplication since the rat-mouse split (Table 5; data not shown). 
This is consistent with the Rfp37-like zinc-finger-like pseudogene 
having uniquely 'hitchhiked' for virtually all of the rat-specific 
a 2u -globulin gene duplications (Fig. 13a). The sequences of these 
genes are also evolving rapidly, with median K A /K S values of 0.77 
and 1.06 for rat and mouse genes, respectively. Amino acid sites that 
appear to have been subject to adaptive evolution are situated both 
within the ligand-binding cavity, and on the solvent-exposed 
periphery of the a 2u -globulin structure 139 (Fig. 13b). This demon- 
strates how genome analysis can reveal the imprint of adaptive 
evolution from megabase to single-base levels. 

The rapid evolution of these genes, and the remodelling of 
their genomic regions, can be attributed to the known roles of rat 
a 2u -globu!ins and mouse major urinary proteins in conspecific 
competition and sexual selection. These proteins are pheromones 
and pheromone carriers that are present in large quantities in 
rodent urine, and act as scent markers indicating dominance and 
subspecies identity 186 ' 187 . 

Detoxification 

Cytochrome P450 is a well-recognized participant in metabolic 
detoxification, and we also observe rapid evolution within this 
family. These enzymes metabolize a large number of toxic and 
endogenous compounds 188 and thus are particularly relevant to 
clinical and pharmacological studies in humans. As rodents are 
important model organisms for understanding human drug 
metabolism, it is important to identify 1:1 orthologues and 
species-specific expansions and losses 189 . Compared with human 
genes, there are clear expansions of several rodent P450 subfamilies, 
but there are also significant differences between rat and mouse 
subfamilies (Fig. 14a). The fastest-evolving subfamily seems to be 
CYP2J, containing a single gene in human, but at least four in rat 
and eight in mouse (Fig. 14b, c). CYP2J enzymes catalyse the 
NADPH-dependent oxidation of arachidonic acid to various eico- 
sanoids, which in turn possess numerous biological activities 
including modulation of ion transport, control of bronchial and 
vascular smooth muscle tone, and stimulation of peptide hormone 
secretion 190 . The genomic ordering of genes and their phylogenetic 
tree indicate an ongoing expansion in the rodents (Fig. 14b, c). This 
suggests that adaptive evolution has been involved in diversifying 
their functions. Moreover, detailed study of the nuclear receptors, a 
highly conserved family of transcription factors, revealed that PXR 
and CAR, two nuclear receptors regulating CYP genes involved with 
detoxification 191 , have the two highest nucleotide substitution rates 
in their ligand binding domains, whereas SF-1, the nuclear receptor 
regulating CYP19 (ref. 192), which has not undergone expansion, is 
more conserved, like other nuclear receptors 193 . 

Proteolysis 

Protease and protease inhibitor genes also represent an example of 
rapid evolution in the rat genome. Proteases are a structurally and 
functionally heterogeneous group of enzymes involved in multiple 
biological and pathological processes 194 . The rat contains 626 
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protease genes, ~1.7% of the rat gene count 124 , more than human 
(561) but similar to mouse (64 1) 125 . Of the rat protease genes, 102 
are absent from human, and 42 are absent from mouse (Fig. 15). 
Several rat gene families have expanded, including placental cath- 
epsins, testases, kallikreins and haematopoietic serine proteases; 
others appear to have formed pseudogenes in humans (Table 8). 
These protease families are mainly involved in reproductive or 
immunological functions, and have evolved independently in the 
rat and mouse lineages. 

The rat protease inhibitor complement contains 183 members, 
similar to mouse (199) but larger than human (156). As with the 
protease genes, the rapid evolution in protease inhibitors derives 
from differential expansions of specific families such as serpins and 
cystatins. The concomitant expansions in rat and mouse proteases 
and their inhibitors appear to reflect homeostasis of protein 
turnover. 

These gene family expansions dramatically illustrate how large- 
scale genomic changes have accompanied species-specific inno- 
vation. Positive selection of duplicated genes has afforded the rat 
an enhanced repertoire of precisely those genes that allow repro- 
ductive success despite severe competition from both within its 
own, and with other, species. This serves as a general illustration of 
the importance of chemosensation, detoxification and proteolysis 
in innovation and adaptation. 

Human disease gene orthologues in the rat genome 

A further strong motivation for sequencing the rat genome was to 
enhance its utility in biomedical research. Although the rat is 
already recognized as the premier model for studying the physio- 
logical aspects of many human diseases, it has not had as prominent 
a role in the study of simple genetic disease traits. As more than 
1,000 human mendelian disorders now have associated loci and 
alleles, there is now a tremendous opportunity to link the new 
knowledge of the rat genome with data from the human disease 
examples. The precise identification of the rat orthologues of 
human genes that are mutated in disease creates further opportu- 
nities to discover and develop rat models. 

Predicted rat genes were compared with 1,112 well- characterized 
human disease genes 195 that were verified and classified on the basis 
of pathophysiology (H.H., E.E.W., H.W., K.G.W., HX, L.G., P.D.S., 
D.N.C., D.S., M.M.A., C.P.P. and K.F., unpublished work). As 
predicted by Ensembl, 844 (76%) have 1:1 orthologues in the rat 
These predictions are likely to be of high quality because 97.4% of 



the 11,422 rathuman 1:1 orthologues predicted by Ensembl were 
found in orthologous genomic regions. 

We asked if these 'disease orthologue' pairs were distinguishable 
from other rat-human orthologues. Ensembl automatically pre- 
dicts that 11,522 human genes have rat 1:1 orthologues (corre- 
sponding to 46% of all Ensembl predicted human genes). By 
contrast, a much higher proportion (76%) of human disease 
genes have Ensembl-predicted rat 1:1 orthologues. Careful analysis 
of the remaining 268 human genes that were not predicted by 
Ensembl to show 1 : 1 orthology indicated that only six of the human 
disease genes lack likely rat orthologues among genome, cDNA, EST 
and protein sequences 196 . Thus, it appears that, in general, genes 
involved in human disease are unlikely to have diverged, or to have 
become duplicated, deleted or lost as pseudogenes, between rat and 
human (conservation of orthologues discussed above). 

We next compared K s , K A and the K A /K S ratio values of 'disease 
orthologues' with those of all remaining orthologue pairs. Only the 
K s distributions differed significantly 196 , suggesting that coding 
regions of human disease genes and their rat counterparts have 
mutated more rapidly than the non-disease genes. This might result 
from factors influencing the specific loci, or the disease genes may 
characteristically reside in genomic regions that exhibit higher 
mutation rates. 

The disease gene set was next grouped into 16 disease-system 
categories and analysed using a non-parametric test for K A /K S 
(human/rat) 196 (Fig, 16). Only five disease systems exhibited 
significant K A /K S differences with respect to the remaining samples 
(P < 0.05). Neurological and malformation-syndrome disease 
categories manifested the lowest median K A /K S ratios that are 
consistent with purifying selection acting on these gene sets. With 
a comparison of the mean to the mean and standard deviation of 
the null hypothesis, [(Mean-Mean0)/Std0] of -4.63 (P < 0.0001), 
the neurological disease gene set revealed the most evidence for 
purifying selection of the disease gene categories examined. In 
contrast, the pulmonary, haematological and immune categories 
manifested the highest median K A /K S ratios, and the genes of the 
immune system disease category, with a value for (Mean-MeanO)/ 
StdO of 4.98 (P < 0.0001), show the highest K A /K S ratios. These 
results are consistent with a role for more positive selection, or 
reduced selective constraints, among these genes. 

Where possible, we further considered conservation of these 
pathophysiology-based gene sets among orthologues of more 
diverse phyla, including mouse, fish, fly, nematode worm and 



Table 8 Protease-expanded gene families and pseudogenes in rat, mouse and human genomes 



Protease 


Rat gene / locus 


Human gene / locus 


Mouse gene / locus 


Function 


Absent genes in assembly 


13 from 626 (2.07%) 


5 from 561 (0.89%) 


5 from 641 (0.78%) 




Expanded families 










Placental cathepsins 


10genes/ 1 7p14 


Absent 


8 genes / 13B3 


Reproduction 


Testins 


3 genes / 17p14 


Absent 


3 genes / 13B3 


Reproduction 


Glandular kallikreins 


10 genes /1q21 


Absent 


15 genes /7B2 


Reproduction 


Mast cell chymases/granzymes 


28 genes /15p 13 


4 genes / I4q11 


17 genes / 14C1 


Host defence 


Human pseudogenes 










Chymosin 


1 gene / 2q34 


1 ps/1p13 


1 gene/3F3 


Digestion 


Distal intestinal serine proteases 


2 genes / 10q12 


1 ps/16p12 


2 genes / 1 7A3 


Digestion 


Pancreatic elastase 


1 gene / 7q35 


1 ps/12q13 


1 gene / 15F3 


Digestion 


Fertilins and reproductive ADAMs 


7 genes /various loci 


6 ps / various loci 


8 genes / various loci 


Reproduction 


Testases 


4 genes / 16q12 


3 ps / 8p22 


9genes/8B1 


Reproduction 


Testis serine proteases 


5 genes / various loci 


5 ps / various loci 


6 genes / various loci 


Reproduction 


Implantation serine proteases 


2genes/10q12 


1 ps/16p13 


2 genes / 1 7A3 


Reproduction 


Airway trypsin -like proteases 


3 genes / 14p21 


3ps/4q13 


3 genes / 5E1 


Host defence 


Rat pseudogenes 










Calpain 13 


1 ps/6q12 


1 gene / 2p23 


1 gene/17E2 


Reproduction ? 


Pyroglutamyl -peptidase II 


1 ps/1q22 


1 gene / 15q26 


1 gene / 7C 


Metabolism 


G!n-fructose-6-P transamidase 3 


1 ps/Xq14 


1 gene/Xq21 


1 ps / XC3 


Metabolism 


Aminopeptidase MAMS/L-RAP 


1 ps/1q12 


1 gene/5q15 


1 ps / 17A3 


Host defence 


Carboxypeptidase O 


1 ps/9q31 


1 gene / 2q33 


1 ps/1C2 


Unknown 


Procollagen III N-endopeptidase 


1ps/19q12 


1 gene / 16q24 


1 ps / 8E2 


Metabolism ? 


Kallikrein-2 and -3 


2ps/1q21 


2genes/19q13 


1 ps / 7B2 


Reproduction 


Testis-specific protein 50 


1 ps / 8q32 


1 gene/3p21 


1 gene / 9F2 


Reproduction 
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yeast orthologues. Overall, we obtained results consistent with those 
reported here for these rat:human 1:1 orthologous gene disease 
categories 196 . These results demonstrate that the individual genes 
that constitute various disease systems exhibit significantly different 
average evolutionary rates. The higher evolutionary rates noted for 
the immune system disease genes are consistent with a previous 
finding that lymphocyte-specific genes evolve relatively rapidly 197 
and may indicate rapid diversification of the functions of the 
immune systems of rodents and humans. This is expected for 
genes involved in controlling species- restricted infectious agents if 
strong adaptive pressure acts during host-pathogen co-evolution. 
Thus, the results of studies of these rodent genes may be less directly 
relevant to our understanding of human immune system diseases 
than results obtained for other pathophysiology disease systems 
where conservation is greater and purifying selection is stronger. 

We have also specifically examined a number of genes that 
harbour triplet nucleotide repeats, and are involved in human 
neurological disorders such as Huntington's disease, a condition 
known to be caused by CAG triplet repeat expansion producing 
abnormally long polyglutamine tracts in an otherwise normal 
protein 198 . Analysis of the rat-human orthologues of these disease 
genes indicated that repeat-expansion disease genes exhibit a repeat 
length that is substantially shorter in the rat than that found in the 
normal human gene (Fig. 17). In all cases, human disease genes 
localize below the line demarcating 1:1 length correlation, showing 
that rat orthologues uniformly bear shorter repeats. At present, 
there are no naturally occurring rat strains described that exhibit 
neurological disease associated with repeat-expansion mechanisms. 
The shorter repeat length of these orthologues in the rat would be 
consistent with either the lack of repeat- expansion mutational 
mechanisms in the rat or the failure of these orthologues to achieve 
a 'critical repeat length' susceptible to such mutational mechanisms. 
Other human genes, not at present known to be associated with 
disease, also contain glutamine repeats that are much shorter in the 
rat orthologues, and thus, could be investigated as potential disease 
candidates 196 . These triplet-repeat-bearing genes maybe susceptible 
to mutations that arise through repeat-expansion mechanisms. In 
Fig. 17, it may also be observed that a relatively high proportion of 
repeats are significantly longer in the rat than in their corresponding 
human orthologue. 



I Immune P < 0.0001 




Malformation P < 0.0001 



Neurological P < 0.0001 



Haematological P - 0.0004 
Pulmonary P = 0.02 



Increasing 



Purifying selection 



Decreasing 



Figure 16 Selective constraints differ for human disease systems in the rat genome. 
Human disease system categories showing significant differences {P < 0.05) in a non- 
parametric test (Mann-Whitney-Wilcoxon) comparing K A /K S (humamrat) ratios. P values 
from two-level tests between genes from one disease system and the remaining genes. 
(Mean-Mean0)/Std0 values from multi-level tests from 16 categorized disease systems. 
Negative values (shown in yellow and orange) for neurological (-4.63) and malformation- 
syndrome (-4.04) categories were observed to be consistent with K A /K S ranges in 
which purifying selection predominates. Immune, haematological and pulmonary 
categories show positive values of 4.98, 3.59 and 2.34, respectively (for complete data 
set and details, see ref. 199). 



In addition to enabling the direct comparison of rat-human 
disease orthologues, the rat genome sequence itself is an invaluable 
aid for the discovery of additional rat genes that can be studied as 
disease models. Two general modes can now be pursued. First, genes 
underlying disease phenotypes with simple inheritance that have 
been mapped to chromosomal regions can be more easily pursued 
in both species. Indeed, the rearrangements of conserved segments 
between the two species in this map were found to have significant 
value, because they tighten the boundaries of the mapped disease 
regions and thus reduce the number of genes that could potentially 
be associated with a given disease phenotype 113 . Second, the 
identification of multiple alleles contributing to quantitative and 
complex trait differences that are involved in disease processes can 
be pursued with more accuracy, both in the initial association 
phases, and in subsequent efforts to detect causative alleles. 

Rat single nucleotide polymorphisms 

The discovery and cataloguing of the natural DNA variation that 
persists between individual rat strains will allow further research 
using rat model systems. Although many rat microsatellites have 
been characterized and studied, single nucleotide polymorphisms 
(SNPs) are of more general interest because of their probable 
ubiquity, and the ease with which they can be assayed. SNP data 
have three broad applications: (1) the individual markers can be 
used in ongoing efforts to associate phenotypes that have complex 
underlying genetic components, with specific sites in the genome. 
(2) A panel of such markers can be used in conjunction with 
selective breeding and chromosome mechanics, to generate rat 
strains that are amenable to the kinds of manipulations that will 
hasten the discovery of important alleles, (3) A set of such markers 
can be used to detail the history of the different genomic events that 
have led to the structure of the genomes of contemporary rat strains. 
A detailed map of these events has a utility analogous to the current 
human haplotype (HapMap) mapping project 199 and will probably 
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Figure 17 Polyglutamine repeat length comparison between human and rat. Points 
represent protein poly-Q length for rat and human. Red points correspond to repeats in 
genes associated with human disease: SCA1 , spinocerebellar ataxia 1 protein, or ataxinl ; 
SCA7, spinocerebellar ataxia 7 protein; MJD, Machado-Joseph disease protein; CACNA1 A, 
spinocerebellar ataxia 6 protein, or calcium channel alpha 1 A subunit isoform 1 ; DRPLA, 
dentatorubral pailidoluysian atrophy protein; HD, Huntington's disease protein, or 
huntingtin; TBP, TATA binding protein or spinocerebellar ataxia 1 7 protein. Repeat lengths 
over ten were examined; green shading delineates the range not included in our analysis. 
Also noted are a set that are expanded in rat and human (black circle) and a set where 
repeats are expanded in the rat. 
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aid disease gene identification, as recently suggested for the 
mouse 200 . 

The Rnor3.1 draft sequence was generated primarily from DNA 
of a single inbred rat line. This maximized the likelihood of deriving 
an accurate sequence assembly, but reduced any likely discovery of 
natural variation in this phase of the project. As a consequence there 
has been no large-scale public SNP discovery from rat genomic 
sequencing. A pilot project based on coding (c)SNP discovery has 
been initiated, however 201 , as these cSNPs represent a particularly 
important subset of variants that may have direct functional 
significance 202 . These data have illustrated both immediate appli- 
cations and the long-term potential for an effort aimed at compre- 
hensive SNP discovery. 

Conclusions 

As the third mammalian genome to be sequenced, the rat genome 
has provided both predictable and surprising information about 
mammalian species. Although it was clear at the outset of this 
programme that ongoing rat research would benefit from the 
resource of a genome sequence, there was uncertainty about how 
many new insights would be found, especially considering the 
superficial similarities between the rat and the already sequenced 
mouse. Instead, the results of the sequencing and analysis have 
generated some deep insights into the evolutionary processes that 
have given rise to these different species. In addition, the project has 
been invaluable in further developing the methods for the gener- 
ation and analysis of large genome sequence data sets. 

The generation of the rat draft tested the new Combined 
approach' for large genome sequencing. As the overall assembly is 
of high quality, there is no doubt that this overall strategy, and the 
supporting software we have developed, provides a suitable 
approach for this problem. Because we included a BAC 'skimming' 
component in the underlying data set, the assembly recovered a 
fraction of the genome that was expected, by analogy to the mouse 
project, to be difficult to assemble from pure WGS data, In addition, 
the BAC skimming component allowed progressive generation of 
high-quality local assemblies that were of use to the rat research 
community as the project developed. On the other hand, although 
the BAC component used here was far less expensive than the fully 
ordered and highly redundant set used in the hierarchical approach 
to sequencing the human genome, it nevertheless increased the 
overall cost of data production relative to a WGS approach. 

The issue of efficacy of WGS versus other approaches to the 
sequencing of large genomes remains a matter of earnest scientific 
debate. In ongoing projects at different centres that participated in 
the RGSP consortium, different approaches are being used to tackle 
new genomes. These include pure WGS methods, the combined 
approach and variations on that methodology. The future appli- 
cation of the different procedures depends on the target genome 
sizes, the expected degree of heterogeneity (that is, polymorphism) 
in the organism to be sequenced, and the preferences of the 
individual centre. So far, all the genomes that have been analysed 
by RGSP consortium members have been of high quality and we 
anticipate that this will continue as the benefits and disadvantages of 
different approaches are further studied and analysed. 

The rat genome data have improved the utility of the rat model 
enormously. Now that near-complete knowledge of the rat gene 
content is realizable, individual researchers have a data source for 
the rat 'parts list' that can be explored with the high degree of 
confidence and precision that is appropriate for biomedical 
research. A similar improvement has been made in the resources 
for physical and genetic mapping, because the relative position of 
individual markers is now known with high confidence and there 
are now computational resources to bridge the process of genetic 
association with gene modelling and experimental investigation. 
These advances have been reflected by measured increases in the use 
of all the rat-specific public genome data sets that can be accessed 
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online, as well as by the informally assessed increases in overall 
'genomic' research of this model. 

The expected benefit of a third mammalian sequence providing 
an outgroup by which to discriminate the timing of events that had 
already been noted between mouse and human was fully realized. 
Using the three sequences and other partial data sets from 
additional organisms, it was possible to measure some of the overall 
faster rate of evolutionary change in the rodent lineage shared by 
mice and rats, as well as the peculiar acceleration of some aspects of 
rat-specific evolution. The observation of specific expanded gene 
families in the rat should provide material for targeted studies for 
some time. 

At this time there is no plan to further upgrade or finish the rat 
genome sequence. This programme decision is a consequence of the 
high cost of converting draft sequence to finished data, and the 
pressing need to analyse new genomes. However, as the distant 
objective of very-low-cost sequencing or other advances that can 
improve draft sequences inexpensively are realized, it might be 
envisioned that a rat sequence that approaches the quality of 
the current human data will be produced. A finished rat genome 
may answer many questions, as specific clues already show that 
areas of the genome that are most difficult to resolve in a 
random sequencing project are also those areas that are most 
dynamic, and therefore of high potential interest in an evolutionary 
context. 

Despite the advances represented here, we are clearly still at the 
beginning of the full analysis of the mammalian genome and its 
complex evolutionary history. Much of the additional data that are 
required to complete this story will be from other genomes, 
distantly related to rat. Nevertheless, a considerable body of data 
remains to be developed from this species. In addition to the distant 
prospect of a finished rat genome, analysis of other rat strains may 
yield genome-wide polymorphism data, while targeted efforts to 
generate cDNA clone collections will provide rat-specific reagents 
for routine use in research. Together with the ongoing efforts to fully 
develop methods to genetically manipulate whole rats and provide 
effective 'gene knockouts', the current and future rat genome 
resources will ensure a place for this organism in genomic and 
biomedical research for some time. □ 

Methods 

DNA sequencing and data access 

Paired-end reads from BAC and WGS libraries were produced as previously described 2,203 . 
Unprocessed sequence reads are available from the NCBI Trace Archive (ftp:// 
ftp.ncbi.nih.gov/pub/TraceDB/rattus_norvegicus/); raw eBAC assembly data are available 
from the BCM-HGSC (http://www.hgsc.bcm.tmc.edu/Rat/); and the released Rnor3.1 
assembly is available from the BCM-HGSC (ftp://ftp.hgsc.bcm.tmc.edu/pub/analysis/rat/ 
), the NCBI (ftp://ftp.ncbi.nih.gov/genomes/R_norvegicus), and the UCSC (http:// 
genome.ucsc.edu/downloads.html). 

Genome assembly 

Assembly of the rat genome by the Atlas system is described in detail elsewhere 54 . Earlier 
assemblies (Rnor2. 0/2.1) of the initial data set were based on 40 million total reads and 
19,000 BAC skims. These assemblies spanned 2.66 Gb and comprised over 900 ultrabactigs 
with N 5Q of over 5 Mb. They differed only in the removal of short artefactual duplications 
from Rnor2.0. Rnor3.1 includes another 1,100 BACs, selected to fill gaps in Rnor2.1. 
Because of the comprehensive coverage of the genome by Rnor2.0/2.1, it was used for the 
initial predictions of genes and proteins. 

BAC fingerprints 

An agarose-gel-based fingerprinting methodology 204-207 was employed to generate HindUl 
fingerprints from 199,782 clones in the CHORI-230 BAC library. The contig assembly was 
subjected to manual review and editing to refine clone order within contigs and to make 
merges between contigs, using tools provided in the FPC software 208 " 2 ' 0 . Fingerprints for 
5,250 RPCI-31 PACs 2 " and RPCI-32 BACs were subsequently added to allow correlation 
between the fingerprint map and a developing YAC map of the rat genome. BAC and PAC 
clones are available through BACPAC Resources at CHORI (bacpacorders@chori.org). 

BAC, PAC and YAC maps 

Markers generated from BAC and PAC clones were hybridized against YAC SH (R.D., 
Pmatch, unpublished software) and radiation hybrid libraries 61,212 to produce 
independent maps that were subsequently combined. Genetic markers from two rat 
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genetic maps 61 and the radiation hybrid map 5S were aligned to the Rnor3.1 assembly 
using BLAT 123 (when sequence was available) or electronic polymerase chain reaction 
(EPCR) 2 ' 3 . 

Finished sequence used for quality assessment of the assembly 

To assess the accuracy of the Atlas assembly, the Rnor3.1 sequence was compared to 13 Mb 
of sequences that had been finished to high quality. 

Large-scale rearrangements 

We compared these assemblies: Human (April 2003, NCBI build 33); Mouse (February 
2003, NCBI build 30); and Rat (June 2003, Rnor3.1). Repeats were masked using 
RepeatMasker (A.S. & R Green, unpublished work; see http://ftp.genome. 
washington.edu/RM/RepeatMasker.html) and TandemRepeatFinder 214 . Local alignments 
were produced using PatternHunter 70 (Supplementary Information). Repeat 
contamination was removed and the remaining similarities combined into two- and three- 
way anchors 73 and synteny blocks produced at various resolutions using GRIMM- 
Synteny 7 '. 

Genome-wide visualization of conserved synteny 

Pairwise comparisons of the genomes of human, mouse and rat using MULTIZ 69,215 , 
MLAGAN 2 16,21 7 , MAVID' 10 , PatternHunter 70 and Pash 72 were merged into blocks of 
conserved synteny 69,7 ' ,72 , and the 1 -Mb- resolution images were displayed using the Virtual 
Genome Painting method (M.L.G.-G. et al, unpublished work; http:// 
www.genboree.org). 

Rat segmental duplications 

Segmental duplications >5kb were identified, extracted and aligned as described 2 ' 8 , and 
paralogous sequence relationships were assessed using PARASIGHT visualization software 
(J.A.B., unpublished work; Supplementary Information). 

Venn diagram 

Pairwise and three-way alignments generated using BLASTZ 219 and MULTIZ 215 or 
HUMOR 215 were analysed to classify each nucleotide in the three genomes by the species 
with which it aligns: in all three species, aligning between human and rat (but not mouse) , 
between human and mouse (but not rat), or between mouse and rat (but not human). 
Other nucleotides are species-specific; unassigned nucleotides occupying gaps in the 
genome assemblies were excluded. On the basis of output from RepeatMasker 164 and 
RepeatDater 89 , nucleotides were assigned to categories (of non-repetitive, repetitive with a 
certain ancestry, or repetitive but unassigned) and counted. See Supplementary Table SI- 1 
for details. 

Gene prediction 

ENSEMBL transcript models were built from 28,478 rodent proteins that were aligned to 
the genome using a combination of Pmatch (R.D., unpublished software), BLAST 220 and 
GeneWise 221 . Models based on 5,083 vertebrate proteins were added in regions without 
rodent-protein-based models. UTRs were added using 1 1,170 transcripts built from 8,615 
different rat cDNAs aligned to the genome using BLAT, with coverage ^90% and identity 
2:95%. This procedure (as described 1 ' 2 but without GENSCAN predictions), gave rise to 
18,241 genes and 20,373 transcripts. This is the protein-based gene set. Rat and mouse 
cDNA and rat EST-based gene sets were also built. See Supplementary Information for 
details. 

Non-processed pseudogene identification 

Human and mouse genes related by 1:1 orthology and lacking an apparent rat orthologue 
were considered. See Supplementary Information for details. 

High-resolution analyses of chromosome 10 

These were performed predominantly on the whole genome alignments 2 ' 7 . Plots in Fig. 9 
were generated by sliding windows of width 2 Mb and a step size of 400 kb (total — 277 
windows). See Supplementary Information for details. 
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EXHIBIT 5 



Motor Neuron Degeneration in Mice That Express 
a Human Cu,Zn Superoxide Dismutase Mutation 

Mark E. Gurney,* Haifeng Pu, Arlene Y. Chiu, 
Mauro C. Dal Canto, Cynthia Y. Polchow, Denise D. Alexander, 
Jan Caliendo, Afif Hentati, Young W. Kwon, Han-Xiang Deng, 
Wenje Chen, Ping Zhai, Robert L. Sufit, Teepu Siddique 

Mutations of human Cu.Zn superoxide dismutase (SOD) are found in about 20 percent of 
patients with familial amyotrophic lateral sclerosis (ALS). Expression of high levels of 
human SOD containing a substitution of glycine to alanine at position 93— a change that 
has little effect on enzyme activity— caused motor neuron disease in transgenic mice. The 
mice became paralyzed in one or more limbs as a result of motor neuron loss from the spinal 
cord and died by 5 to 6 months of age. The results show that dominant, gain-of-function 
mutations in SOD contribute to the pathogenesis of familial ALS. 



Amyotrophic lateral sclerosis occurs in 
both sporadic and familial forms and results 
from the degeneration of motor neurons in 
the cortex, brainstem, and spinal cord. The 
disease typically begins in adults as an 
asymmetric weakness in two or more limbs 
and then progresses to complete paralysis 
(1). Familial ALS is inherited as an auto- 
somal dominant trait (2). About 10% of 
ALS cases are familial and, of these, ~20% 
have mutations in Cu,Zn superoxide dismu- 
tase (SOD) (3-5). SOD catalyzes the dis- 
mutation of superoxide radical (0 2 '-) into 
hydrogen peroxide and molecular oxygen. 
Familial ALS patients heterozygous for 
SOD mutations have 50 to 60% of the 
normal level of SOD activity in their red 
blood cells and brains (4, 6) . 

To explore how mutations in SOD 
might selectively cause motor neuron de- 
generation, we produced transgenic mice 
that express wild-type or mutant forms of 
human SOD (7, 8). Two mutations were 
analyzed: an Ala 4 -* Val substitution 
(A4V) and a Gly 93 -* Ala substitution 
(G93A) (3, 4). Previously described mice 
that express wild-type human SOD 
(NSOD) show no signs of overt motor 
neuron disease but do have mild pathologic 
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changes in the innervation of muscle that 
are suggestive of premature aging (8, 9). 

Transgenic founder mice were produced 
by DNX (Princeton, New Jersey) or 
through the National Transgenic Develop- 
ment Facility (National Institutes of 
Health). Fertilized eggs for injection were 
obtained from crosses of (C57BL6 x SJL) 
F l hybrid mice. Founder mice were bred 
with C57BL6 mice, and their progeny were 
used for subsequent analysis (JO). Trans- 
genic mice were identified by polymerase 
chain reaction amplification of tail DNA 
(11) and were screened for expression of 
human SOD in red blood cells by an anti- 
gen capture enzyme immunoassay (EIA) 



that used a polyclonal antibody to human 
SOD and the mouse monoclonal antibody 
SD-G6 (12). The EIA detected human 
SOD in G93A and NSOD mice, but not in 
A4V mice. However, Northern (RNA) 
analysis (13) and immunoblots (14) devel- 
oped with a different mouse monoclonal 
antibody (CZSODF2) demonstrated ex- 
pression of human SODl mRNA and pro- 
tein in the brains of G93A, NSOD, and 
A4V mice (Fig. 1, A to C). Thus, the A4V 
mutation altered an epitope needed for 
recognition in the EIA. 

The mutations of SOD found in familial 
ALS alter the stability of human SOD as 
shown by DNA transfection of cultured 
cells (15). Consistent with those results, we 
found that the mutant transgenic lines ex- 
pressed only one-half as much human SOD 
as did NSOD mice expressing comparable 
amounts of mRNA (Table 1). In addition, 
we found that the G93A mutation had 
little discernible effect on human SOD 
activity, whereas the A4V mutation greatly 
reduced enzymatic activity (15, 16). Al- 
though we detected enzymatically active 
mouse-human dimers in NSOD and G93A 
transgenic mice on SOD activity gels (17), 
we did not detect any active mouse-human 
A4V dimers. These results are compatible 
with the finding that recombinant human 
SOD bearing an Ala 4 — > Gin substitution is 
enzymatically inactive (18). 

Mice from one of the G93A transgenic 
lines (Gl) (Table 1) that expressed the 
largest amounts of mutant SOD in the brain 




G1 . 



7 8 9 1011 




90 120 150180 
Age (days) 



Fig. 1 . (A) Northern anal- 
ysis of human SOD 1 
mRNA expression in 
transgenic mouse brain, 
(8) The same mem- 
brane hybridized with, 
a probe for G3PDH. 
(C) . Expression of hu- 
man SOD in transgenic 
brain by immunoblotting. 
Lanes contain samples 
from the following mice: 
1,G1;2, G5; 3.G1 2.199; 
4, G20: 5, A1073; 6, 
A1074; 7, N1026; 8, 
N1029; 9. N1030; 10. 
G12.15; and 11. non- 
transgenic littermate. (0) 
Partial pedigree of the 
G1 transgenic line. In the 

F 2 generation, both males and females inherited the transgene, 
which indicates that the site of integration is autosomal Circles, 
female; squares, male; filled symbols, transgenic mice: filled 
symbols with bar, affected mice, (E) Survival analysis showing 
the percentage of transgene-positive mice among the G1 sib- 
lings that are not impaired (m) or not moribund (•) , measured at 
10-day intervals of observation, G1 mice became noticeably 
impaired by 121 ± 23 days of age {mean - ± SD, n » 5) and moribund by 169 ± 16 days. (F) The 
condition or G1 transgenic mice deteriorated rapidly over the 2-week period before their death, as shown 
by the shortening of their stride (at, G1.2; ■•, G1.6; A, Gi.8; and 6o[[e6 line, average stride of normal 
male mice). 
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developed a stereotyped syndrome sugges- 
tive of motor neuron disease. The disease 
has not been observed in any line of NSOD 
mice expressing wild-type human SOD, nor 
have symptoms developed in any A4V 
mouse at comparable ages. At 3 to 4 



months of age, Gl mice began to show 
signs of hind limb weakness (Fig. IE). They 
extended their hind legs less than normal 
when lifted by the base of the tail, their 
coats developed a coarse appearance sugges- 
tive of impaired grooming, and they ap- 



Flg. 2. Loss of spinal motor neu- 
rons in affected G1 transgenic 
mice. Spinal cords from a normal 
littermate (A) and a G1 transgenic 
mouse (B) show loss in the latter 
of lateral motor columns in the L4 
spinal segment (cresylecht-violet 
stain). (C and D) Spinal cords 
from a normal littermate (C) and a 
G1 transgenic mouse (D), show- 
ing loss in the latter of ChAT- 
positive ventral horn motor neu- 
rons in the L3 spinal segment 
(27). Lumbar spinal cords from 
an N1026 mouse (E) and a nor- 
mal littermate (F) show staining of 
ventral horn motor neurons with 
an antibody to human SOD (CZ- 
SODF2). (G through J) Normal 
littermate dorsal (G) and ventral 
(H) lumbar spinal roots and G1 
transgenic dorsal (I) and ventral 
(J) lumbar roots (stained with tolu- 
idine blue). The dorsal sensory 
roots were relatively spared (I), 
whereas severe loss of myeli- 
nated axons, myelin debris, and 
infiltrating phagocytic cells were 
apparent in the ventral motor 
roots (J). 
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Fig. 3. Pathology in spinal cord and muscle of transgenic G1 mice. (A and B) Lumbar spinal 
segments from a normal littermate (A) and a G1 transgenic mouse (B), stained by the Bielschowski 
technique to reveal neurofibrils. (C) Nonspecific esterase stain of gastrocnemius, showing the low 
frequency of denervated, angulated muscle fibers (arrows) in G1 mice. (D) Sprouting and 
reinnervation of three denervated endplates in the gluteus muscle of a G1 mouse, revealed by a 
combined silver and choltnesterase stain (28). 



peared thin along their flanks. Normal mice 
have a fairly constant stride of 74 ± 1 .6 mm 
(95% confidence interval, n = 50 mice) 
when using an alternating gait (19). Gl 
mice had a normal stride at 3 to 4 months of 
age, but by 5 months of age it deteriorated 
rapidly (Fig. IF). Over a span of 2 weeks, 
the mice became paralyzed in one or more 
limbs. The founder mouse and four of five 
transgenic Fj progeny developed paralysis of 
one or more hind limbs. A fifth transgenic 
Fj mouse (G1.6) retained use of his hind 
limbs but developed complete paralysis of 
his right forelimb. The six nontransgenic 
littermates of these mice showed no signs of 
disease. All affected mice developed a trem- 
or of the hind limbs when suspended in the 
air. They had a normal posture when quiet 
with the hind limbs held in flexion, but 
after initiating movement, their hind limbs 
and toes frequently locked in a hyperex- 
tended position. Affected mice became 
moribund by 5 months of age and were 
killed when they were no longer able to 
forage for food or water. 

The founder of the Gl line, all of his 
transgenic ¥ { progeny, and at least one male 
F 2 mouse developed the same stereotyped 
syndrome suggestive of motor neuron dis- 
ease affecting both upper and lower motor 
neurons. The other lines of G93A trans- 
genic mice (Table 1) expressed smaller 
amounts of the mutant protein and so far 
have had normal motor behavior. In Gl 
mice as well as in humans with ALS (2), 
the onset of the disease is dependent on 
age, so it is conceivable that the other lines 
of G93A mice may develop the disease at a 
later age. However, because the disease is 
expressed in only one line of mice, we 
cannot exclude the possibility that the site 
of integration of the transgene caused the 
disease syndrome in these mice. Disease is 
not due simply to overexpression of SOD in 
the brains of Gl mice, because NSOD mice 
that express comparable or greater amounts 
of total brain SOD do not develop the 
disease (10) (Table 1). 

Pathological analysis of Gl mice demon- 
strated a severe loss of choline acetyltrans- 
ferase (ChAT) -containing spinal motor 
neurons (Fig. 2, A to D). A few motor 
neurons appeared normal, but most of the 
remaining neurons were filled with a neu- 
rofibrillar material (Fig. 3) that appeared to 
be phosphorylated neurofilaments (20). 
The most pronounced changes were ob- 
served in the ventral spinal cord, whereas 
the dorsal spinal cord, especially the sub- 
stantia gelatinosa, was better preserved. 
Immunohistochemical staining revealed 
large amounts of human SOD in ventral 
horn motor neurons, best shown in NSOD 
mice (Fig. 2, E and F). In Gl mice, there 
was severe loss of large, myelinated axons 
from the ventral motor roots (Fig. 2, H and 
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Table 1. Expression in brain of human SOD1 mRNA, human SOD protein, and total SOD enzymatic 
activity in different transgenic mouse lines. All values are the mean ± SEM (n = 3), except where 
indicated. 



Line 


Mutation 


Gene copy 
number* 


SOD1 
mRNA 
(ng)/10 

m-9 of 

total RNA 


Human SOD 

(ng)/total 
protein (jxg)t 


SOD 
(U)/total 
protein 


G1 


Gly 93 -> Ala 


18.0 ±2.6 


2.5 ± 0.5 


4.1 ±0.54 


42.6 ± 2.1 


G5 


4.0 ± 0.6 


0.8 ± 0.1 


1.3 ±0.21 


27.0 ± 2.9 


G12 




2.2 ± 0.8 


0.8 ± 0.1 


1.1 ±0.22 


19.5 ±0.8 


G20 




1.7 ±0.6 


0.8 ± 0.1 


0.7 ±0.06 


16.9 ± 0.4 


A1073 


Ala 4 Val 


4.7 ± 0.4 


1.1 ± 0.1 


1.0 ±0.21$ 


14.6 ± 0.4 


A1074 




3.2 ± 0.2 


0.7 ± 0.1 


0,9 ±0.21* 


9.1 ± 0.4 


N1029 


Wild-type 


7.2 ± 2.4 


1,5 ± 0.1 


6.7 ±0.76 


37.3 ± 1.9 


N1026 


3.3 ± 1.0 


0.4 ± 0.1 


0.9 ±0.11 


18.6 ± 0.9 


N1030 




1.7 ±0.7 


0.3 ± 0.1 


0.6 ±0.16 


11.8 ±0.3 


Nontransgenic 










10.4 ± 0.5 



*Per diploid genome. IThe amount of human SOD was determined by El A. ^Determined by immunoblot- 
ting (mean ± SEM of regression). 



J). The dorsal sensory roots appeared rela- 
tively spared when compared to the ventral 
roots; however > scattered swollen axons 
with dense axoplasm and occasional mye- 
lin-laden macrophages were observed at all 
levels of the spinal cord (Fig. 2, G and I) . 
These changes extended into the central 
component of the afferent sensory fibers 
within the dorsal columns of the spinal 
cord, a pathology also seen in familial ALS 
(21), Severe loss of myelinated axons oc- 
curred in intramuscular nerves, but less 
than 10% of muscle fibers had the charac- 
teristics of denervated fibers — that is, a 
small, angular profile and an esterase-posi- 
tive phenotype (Fig. 3C). 

To investigate whether sprouting and rein- 
nervation compensated for the destruction of 
motor units caused by the disease, we exam- 
ined whole mounts of the gluteus muscle of a 
Gl mouse (Fig. 3D). The muscles showed 
severe loss of myelinated axons from the 
intramuscular nerves and consequent reinner- 
vation of muscle fibers by primarily nodal 
sprouts. Ongoing reinnervation and remodel- 
ing of muscle innervation were indicated by 
the frequency of multiply innervated end- 
plates and by the scarcity of denervated end- 
plates. In one gluteus muscle, two surviving 
axons in the inferior gluteal nerve appeared 
sufficient to innervate more than 90% of the 
myofibers in the muscle. These data suggest 
that sprouting probably compensates for the 
loss of motor neurons until late in the course 
of the disease. 

Toxicity by a free-radical mechanism is 
one plausible explanation for motor neuron 
death in the Gl transgenic mice and, by 
implication, in humans with familial ALS. 
This mechanism could involve the forma- 
tion of the strong oxidant peroxynitrite 
(ONOO - ) from 0 2 "~ and nitric oxide 
(NO*) free radicals (22, 23). The formation 
of peroxynitrite and its decomposition into 



toxic chemical species have been linked to 
neurotoxicity in cell culture (24) and in 
brain ischemia (25). SOD mutations may 
facilitate this pathway of oxidative damage 
(26) . Because formation of peroxynitrite is 
a second-order reaction that depends on the 
concentration of O z '~ and NO*, decreased 
SOD activity in familial ALS may also 
contribute to pathogenesis if the amount of 
0 2 "~ in tissues is increased (4). Our results 
indicate that dominant, gain-of-function 
mutations in SOD play a key role in the 
pathogenesis of familial ALS. 

REFERENCES AND NOTES 

1. D. W. Mulder, in Human Motor Neuron Diseases, 
L. P. Rowland, Ed. (Raven, New York, 1982), pp. 
15-22. 

2. T. Siddique, Adv. Neurol. 56, 227 (1991). 

3. D. R. Rosen era/., Nature362, 59 (1993). 

4. H.-X. Deng era/., Science 261, 1047 (1993). 

5. T. Siddique, unpublished observations. 

6. A. C. Bowling, J. B. Schulz, R. H. Brown Jr., M. F. 
Beal, J. Neurochem. 61, 2322 (1993). 

7. The A4V mutation was introduced into exon 1 of 
the human SOD1 gene by two-primer mutagene- 
sis with the polymerase chain reaction (PCR); the 
template for mutagenesis was a Sty l-Stu I frag- 
ment encompassing exon 1 . The G93A mutation 
was cloned in a Hind ill and Nsi I fragment 
encompassing exon 4 that was amplified from the 
genomic DNA of family 3-192 (3). These frag- 
ments were used to reassemble a complete 14.5- 
kb Eco Rl-Bam HI fragment of the SOD1 gene [R. 
A. Hallewell, J. P. Puma, G. T. Mullenbach, R. C. 
Najarian, in Superoxide and Superoxide Dismu- 
tase in Chemistry, Biology and Medicine, G. 
Rotilo, Ed. (Elsevier, New York, 1986), pp. 24&- 
256] in two more steps. Exons 1 and 4 of the 
transgenes were sequenced to verify that they 
contained only the desired mutation. The 14.5-kb 
Eco Rl-Bam HI SOD1 transgene directs tissue- 
specific expression of human SOD in mice under 
control of the endogenous human promoter (5). 

8. C. J. Epstein et a/., Proc. Natl. Acad. Sci. U.S.A. 
84, 8044 (1987). 

9. I. Ceballos-Picot et a!.. Brain Res. 552, 198 
(1991); K. B. Avraham et a/., Ce//54, 823 (1988); 
K. B. Avraham, H. Sugarman, S. Rotshenker, Y. 
Groner, J. Neurocytol. 20, 208 (1991). 

10. Mice were housed in microisolator cages within a 
barrier facility. Frequent monitoring revealed no 



evidence for infection by viral or bacterial patho- 
gens. 

11. The primers described (3) were used for identifi- 
cation of transgenic mice by PCR. Transgene 
copy number was estimated by Southern (DNA) 
DNA hybridization. Denatured DNA (10 \lq) iso- 
lated from mouse tails or human placenta was 
transferred to a nitrocellulose membrane together 
with a dilution series of the cloned SOD1 gene. 
The membrane was hybridized with a random- 
primed, 32 P-labeled probe to sequences within 
the 3' untranslated region of the 0.9-kb human 
SOD1 complementary DNA (cDNA); these se- 
quences are specific to the human transgene. 
Bound radioactivity was quantitated by phosphor 
image analysis, and linear regression was used to 
calculate transgene copy number. 

12. The EIA was constructed with a goat immuno- 
globulin G (IgG) antibody to human SOD (Chiron, 
Emeryville, CA) and a mouse monoclonal anti- 
body designated SD-G6 (Sigma, St. Louis, MO). 
Recombinant human SOD (Chiron) was used as a 
standard. Samples were diluted to within the 
log-linear range of the assay (0.1 to 1.5 ng of 
human SOD per well). There was no cross-reac- 
tivity with mouse SOD. 

13. Northern RNA hybridization was performed with 
10 m-Q of total brain RNA. The membrane was 
hybridized with a random-primed, 32 P-labeled 
probe specific for the 3' untranslated region of the 
human SOD1 cDNA. Quantitation standards (a 
0.9-kb sense human SOD1 cDNA) were loaded 
on the gel with 1 0 y.g of yeast RNA as a carrier, 
and the hybridization signal was analyzed by 
phosphor image analysis. To control for RNA 
loading variations, we rehybridized the blot with a 
glyceraldehyde 3-phosphate dehydrogenase 
(G3PDH) cDNA probe. 

1 4. Samples containing 2 \ig of soluble brain protein 
were subjected to electrophoresis through 10% 
SDS-polyacrylamide gels, transferred to a nitro- 
cellulose membrane, and probed with antibody 
CZSODF2. Bound antibody was detected with a 
biotinylated horse antibody to mouse IgG and a 
Vector ABC kit. The membrane was developed 
with an enhanced chemiluminescence kit (Amer- 
sham, Arlington Heights, IL), and the chemilumi- 
nescence was quantitated by film densitometry. 
The amount of human SOD in brain extracts of 
A4V transgenic mice was determined by compar- 
ison to recombinant SOD standards in adjacent 
lanes. 

15. D. R. Borchelt etai, Proc. Natl. Acad. Sci. U.S.A., 
in press. 

16. Mouse brains were homogenized in cold 10 mM 
tris HCI (pH 7.5) and 10 mM p-mercaptoethanol. 
After centrifugation at 50,000a; for 15 min at 4°C, 
the protein content of the supernatant was mea- 
sured by a bicinchoninic acid assay (Pierce, 
Rockford, IL). We assayed total SOD activity 
within brain extracts in microwells by measuring 
the inhibition of nitroblue tetrazolium reduction [D. 
R. Spitz and L. W. Oberley, Anal. Biochem. 179, 8 
(1989)]. Wells were monitored kinetically, and a 
(V max /V) - 1 transform (where Vis velocity) [K. 
Asada, M. Takahashi, M. Nagate, Agric. Biol. 
Chem. 38, 471 (1974)] was used to linearize the 
data. Recombinant human SOD had an activity of 
6 U per nanogram. The contribution of Mn SOD in 
the sample was determined in the presence of 5 
mM sodium cyanide and was -2% of the total 
SOD activity in the brain extract. 

1 7. O. Elroy-Stein, Y. Bernstein, Y. Groner, EMBOJ. 5, 
615(1986). 

18. R. A. Hallewell etai, Nucleic Acids Res. 13, 2017 
(1985). 

19. Mice were trained to walk up a 75-cm, U-shaped 
ramp that was inclined at one end against the wire 
lid of their cage. Testing was performed in a 
horizontal, laminar flow hood to maintain barrier 
conditions. A bright lamp was placed at the base 
of the ramp, and the cage lid was left in semidark- 
ness. The ramp obscured each mouse's view of 
the laminar flow hood and surrounding room. 
Testing was initiated by allowing the mice 1 to 2 
min to explore the cage lid and the top of the 



1774 



SCIENCE • VOL. 264 • 17 JUNE 1994 



ramp. The hind feet of the mice were painted with 
children's poster paints of contrasting colors. The 
tracks left by the mice as they ran up the ramp 
were recorded on paper tape. 

20. Degenerating neurons were positive for immuno- 
histochemical staining with SMI-31 monoclonal 
antibody (Sternberger Monoclonal Antibodies, 
Baltimore, MD) to phosphorylated neurofilaments, 
although the small number of motor neurons re- 
maining in affected spinal cords and their marked 
pathology require confirmation of this result. 

21. W. K. Engel, L.T. Kurland, I. Klatzo, Brain 82, 203 
(1959); A. Hirano, L T. Kurland, G. P. Sayre, Arch. 
Neurol. 16, 232 (1967). 

22. J. S. Beckman era/., Proc. Natl. Acad. Sci. U.S.A. 
87, 1624 (1990). 

23. H. Ischiropoulos et al., Arch. Biochem. Biophys. 
298, 431 (1992). 

24. S. A. Lipton et al., Nature 364, 626 (1993). 



In September 1991, the frozen mummy of a 
man was found in the Tyrolean Alps. Ra- 
diocarbon dates of skin and bone samples 
indicated an age between 5100 and 5300 
years (1). Because no comparable archaeo- 
logical discovery exists, this find has at- 
tracted considerable scientific and public 
interest. It has also been the subject of 
various rumors and even allegations of fraud 
(2) . Molecular genetic investigations of the 
Ice Man could address some of the ques- 
tions surrounding the find. Comparisons of 
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DNA sequences from the body with con- 
temporary populations may reveal aspects of 
his ethnic affiliation. Molecular studies of 
other organisms such as viruses or bacteria 
associated with the body may furthermore 
illuminate the evolution of these orga- 
nisms. As a first step toward such investi- 
gations, we have analyzed the state of pres- 
ervation of the DNA in the Ice Man and 
determined the sequence of a hypervariable 
segment of the mitochondrial control re- 
gion from numerous samples removed from 
the body. 

Ancient DNA has been retrieved from a 
variety of plant, animal, and human re- 
mains (3, 4) that go back a few tens of 
thousands of years as well as from some 
fossils that are millions of years old (5-7) , 
although the latter results are partially con- 
troversial (8). In most cases, work on ar- 
chaeological DNA has been limited to mi- 
tochondrial DNA because its high copy 
number increases the chance of survival of a 
few molecules in the face of molecular 
damage that accumulates post mortem. Be- 
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202 bp 287 bp 394 bp 540 bp 
A B A B A B A B 

I -SOSbp 

Fig. 1. Agarose gel electrophoresis of mito- 
chondrial DNA amplification of different lengths 
from the Ice Man. For every primer pair, ampli- 
fications from (A) an extract of the Ice Man and 
(B) an extraction control are shown. The primer 
pairs used are as follows: L16055/H16218 (202 
bp), L16055/H 16303 (287 bp), L16055/H 16410 
(394 bp), and L15997/H16498 (540 bp), where 
L and H refer to the light and heavy strand, 
respectively, followed by the number to the 
nucleotide position {14) at the 3' end of the 
primer. Migration positions of molecular size 
markers are given in numbers of base pairs. 

cause the body of the Ice Man has been 
frozen with the exception of a short period 
after its discovery, its DNA may be pre- 
served better than that of other finds. This 
unusual condition might allow nuclear 
markers such as microsatellites to be studied 
in addition to mitochondrial DNA and thus 
open several additional avenues of study. 

A total of eight samples of muscle, 
connective tissue, and bone were removed 
under sterile conditions from the left hip 
region of the body, which had been dam- 
aged during salvage of the mummy. Addi- 
tionally, parts of one sample that has been 
radiocarbon dated (1) were analyzed. Ex- 
tracts of DNA were made from 10 to 200 
mg of each sample by a silica-based method 
that is highly efficient in the retrieval of 
ancient DNA (9) . Enzymatic amplifications 
from the mitochondrial control region were 
attempted. Because this region encodes no 
structural gene products and evolves faster 
than other parts of the mitochondrial ge- 
nome, it is particularly suited for the recon- 
struction of the history of human popula- 

10 4 10 3 :10?i0 1 10 A 8 




Fig. 2. Quantitation of mitochondrial DNA in an 
extract from the Ice Man. A dilution series of a 
competitor template, containing a 20-bp inser- 
tion in a mitochondrial fragment, was added to 
a constant amount of extract, and a PCR that 
used primers L1 6068/H1 621 8 was performed 
as described in (10). The numbers above the 
lanes indicate the numbers of competition mol- 
ecules added to the amplifications. (A) An 
extraction control and (B) a control where no 
template was added. Migration positions of 
molecular size markers are given in numbers of 
base pairs. 



Molecular Genetic Analyses of the 
Tyrolean Ice Man 

Oliva Handt, Martin Richards, Marion Trommsdorff, 
Christian Kilger, Jaana Simanainen, Oleg Georgiev, Karin Bauer, 
Anne Stone, Robert Hedges, Waiter Schaffner, Gerd Utermann, 
Bryan Sykes, Svante Paabo* 

An approximately 5000-year-old mummified human body was recently found in the Ty- 
rolean Alps. The DNA from tissue samples of this Late Neolithic individual, the so-called 
'Ice Man," has been extracted and analyzed. The number of DNA molecules surviving in 
the tissue was on the order of 1 0 genome equivalents per gram of tissue, which meant that 
only multi-copy sequences could be analyzed. The degradation of the DNA made the 
enzymatic amplification of mitochondrial DNA fragments of more than 1 00 to 200 base pairs 
difficult. One DNA sequence of a hypervariable segment of the mitochondrial control region 
was determined independently in two different laboratories from internal samples of the 
body. This sequence showed that the mitochondrial type of the Ice Man fits into the genetic 
variation of contemporary Europeans and that it was most closely related to mitochondrial 
types determined from central and northern European populations. 
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Transgenic ove repression gf Cu**/Zrr +2 superoxide dlsmutase i 
(SOD1) harboring an amyotrophic lateral sclerosis (AL5)-Jinked 
familial genetic mutation (SOt)1 (5s * 3A ) in a Sprague-DawJey rat 
results in ALS-fike motor neuron disease. Motor neuron disease in 
these rats depended on high levels of mutant SOD1 expression, 
increasing from Mold over endogenous SODl in the spinal cord of 
young presymptomatic rats to 16-fold in end-stage animals. Dis- 
ease onset in these rats was early, «115 days, and disease pro- 
gression was very rapid thereafter with affected rats reaching end 
stage on average within 11 days. Pathological abnormalities In- 
cluded vacuoles initially In the lumbar spinal cord and subse- 
quently in more cervical areas, along with inclusion bodies that 
stained for SODl, Hsp70 r neurofilaments, and ubiquitin. Vacuol- 
ization and gliosis were evident before clinical onset of disease and 
before motor neuron death in the spinal cord and brainstem. Focal 
loss of the EAAT2 glutamate transporter kt the ventral horn of the 
spinal cord coincided with gliosis, but appeared before motor 
neuron/axon degeneration. At end-stage dEseasa, gliosis increased 
and EAAT2 loss In the ventral horn exceeded 90%, suggesting a 
role for this protein In the events leading to cell death in ALS. These 
transgenic rats provide a valuable resource to pursue experimen- 
tation and therapeutic development; currently difficult or I 
sible to perform with existing ALS 1 



Amyotrophic lateral sclerosis (ALS) b a late-onset neuromus- 
cular disorder characterized by progressive motor dysfunction 
that leads to parajysas and eventually death. The pathology of the 
disease results from the death of large motor neurons in the spinal 
cord and brainstem (1, 2). ALS occurs in both sporadic and familial 
forms (3). Familial ALS accounts for ^5-10% of all reported cases, 
^proximately 15-20% of familial ALS cases has been linked to 
inheritance in an autosomal dominant fashion of a mutant form of 
Cu +2 /Zu+ 2 superoxide dismutase 1 (SODl) (4, 5). SODl normally 
functions in the regulation of oxidative stress by conversion of free 
radical superoxide anions to hydrogen peroxide and molecular 
oxygen. Over 90 distinct familial SODl mutations have been found 
to date. SODl mutations that have. t*een tested in transgenic mice 
resui t in ALS-like motor neuron disease (6 - 8), but SODl-nuD mice 
do not develop motor neuron disease (9)* Futthexmore, crossing 
SOt>l-miH mice with transgenic ALS mice does not alter disease 
onset or progression (10). Taken together, these results indicate 
that familial ALS does not result from loss of SODl function but 
rather an unidentified gain of function. There is no consensus as to 
the mechanism, and theories include alterations in SOD? folding, 
oxidative stress from aberrant catalysis (11), or cytoplasmic aggre- 
gates (12). New studies also suggest that the disease is not eel] 
autonomous — tha t ncmncuronal cells are necessary for motor neu- 
ron degeneration (13, 14 ? 1). 

Transgenic mouse models expressing mutant forms of SODl 
(15-21) develop neuromuscular disease very similar to human 
ALS. Age of onset of disease varies as a function of both the type 
of mutant expressed in mouse and the relative expression levels 
attained. High expressing SODl^ 93A (13-fold above endogenous 



SODl) and G37R SODl (7-14-fold above endogenous SODl) 
transgenic mice contain membrane-bound vacuoles in cell bodies 
(15, 22) and dendrites (15, 16, 22), which most likely result from 
degenerating mitochondria. Lower expressing SODl Gft3A mice 
(7-fold above endogenous SODl) also contain Lewy-body-like 
cytoplasmic inclusions in the cell bodies of motor neurons (21) 
containing SODl, ubiquitin, and phosphorylated neurofilament 
(23). SODl 08511 transgenic mice expressing mutant SODl as 
little as 20?6 of endogenous levels also develop neuromuscular 
disease characterized by loss of large motor neurons in brainstem 
and in spinal cord (10, 17). No vacuolization has been reported 
in G85R mice or in similar mice expressing the murine coun- 
terpart mutation G86R (18, 24), However, these mice also 
develop cytopl&snric inclusions that appear in astrocytes and 
neurons before clinical signs of disease and dramatically increase 
in abundance with disease progression (10). SODl^ 8511 mice 
have also shown to be deficient in the spinal cord astroglial 
glutamate transporter BAAT2 (OLT-1), similar to observations 
in sporadic ALS (25), suggesting that astroglial dysfunction in 
ALS may contribute to motor neuron degeneration. 

We sought to create a transgenic rat model for ALS by using 
mutant SODl to pursue experimental paradigms currently dif- 
ficult or impossible to achieve in the Smaller transgenic mouse 
models. Rats provide an advantage in pursuit of therapeutic 
strategies such as stem cell replacement and are the preferred 
laboratory animal species for pharmacological manipulations, 

Materials and Methods 

Generation and dtaracterizatloa of Transgenic Rats Expn^ny Human 
SOD1«wa. A 12-kb EcoKl/BamHI restriction fragment of the 
human SODl gene harboring the G93A mutation was microuv 
jected into Spraguc-Dawiey rat embryos. Transgenic rats were 
produced as described (26). Embryos were allowed to develop to 
term and were analyzed for the presence of the transgene. Tail 
biopsies from S-day-old rats were digested in proteinase K and 
then diluted 1:20 in dHzO followed by heating ax 95°C for 15 min. 
Two microliters were subjected to PGR by using primers SOD-Df 
(5 f ~GTGGCATCAGCCCTAATCCA-3 ') and SOD-E4r (5'- 
CACCAGTGTGCGOCCAATGA-3') specific to human SOD J 
to determine the genotypes of founders and offspring. 

Taqman quantitative DNA PCR was performed to determine 
DNA copy number of transgene loci segregating from the 
multiintegrant founders 26, 46, and 51 to their respective Fl 
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Table 1. Multiple transgene integrations in SOD1«» a * founders we** resolved into 
individual I 



Line 


Subline 


CCt 


Copy no. 


Spinal cord 
hSO»1/rSODl 


Blood 
hSODI/rSODl 


Pathology 


26 


26L 


-3 


8 


nd 


0.4 


None 




26H 


-6 


64 


S.6 


0.6 


-115iteys 




26HL 


-7 


72 


10.4 


1.1 


-102 days 


46 


46L 


-0.5 


1-2 


0,4 


<o.i 


None 




46H 


-2.5 


5-6 


2.6 


02 


None 


Si 


51 L 


-2 


4 


U 


0.5 


None 




51 H 


-4 


16 


5.S 


1.2 


None 


61 




-2-3 


4-5 


2.4 


0,1 


None 



ND, not deteiTnined; H. high copy; w low SOpy; h, human; r, rat dCt detta cycle threshold. 



generation progeny. Primer-probe sets specific for human SODl 
find an internal normalfcer gene, ThyL2 7 wore used in multiplex 
PCR on a Taqman 770 PCR thermocycler (PE Biosysteras) 
following the manufacturer's recommended conditions. Data 
were represented as delta cycle threshold (dCt) and were 
converted to relative transgene DNA copy number by tne 
equation 2<- dCl > (Tabie 1), 

Quantitation of SOD1 in Blood and 50M and EAAT2 In Spinal Cord 

Blood samples from tail vein bleeds were solubilized In 10 vol of 50 
mM TrifrHCl, pH 7.5/150 mM NaQ/5 mM EDTA/1% Nonidet 
P-40/1% SDS. For SODl detection, 23 /*g was electrophoresed on 
12% SDS/poryacrylamidc gel and transferred to nitrocellulose. 
Cervical spinal cord was homogenized in 2 ml of 50 rnM Tris-HCl, 
pH 7.5/150 mM NaCl/5 mM NazEDTA/1% Nonidet P-40/1% 
SDS, and 5 ug was electrophoresed as described above. For 
detection of EAAT2, ventral horn of cervical spinal cord was 
dissected by using Oi-ram rrikropunches (Zivic-Mfller) and ho- 
mogenised as described above, and 25 jtg of total protein was 
electrophoresed on 7.S9& SDS/polya Ciylamid e gels. Western blots 
were probed with either anti-SODl (27) (1:5,000), artti-GLT-1 
(EAAT2; 1;1,00G\ Chemicon)> or anr^acun (C4; 1:10,000; Roche 
Molecular Bracbemicals) Abs. 

InwnuPohistodMittkal Analyses. Animals were killed by using ap- 
proved animal welfare protocols and perfused by cardiac punc- 
ture with 4% paraformaldehyde/PBS. Muscle, brain, and spinal 
cord were removed followed by regional dissection of spinal cord 
and spinal nerve roots. Tissue blocks were embedded in paraffin 
or araldite for sectioning (7 and 1 ftm, respectively). Immunds- 
tains and semithin plastic sections were processed as described 
(16, 17, 28). Hematoxylin and eosin stains of muscle and spinal 
cord were performed on paraffin sections, whereas semithin 
sections of spinal roots were stained with toluidlne blue. Inunu- 
nost ainin g was performed with Abs to neurofilament with 
SMl-32 (1:8,000; Stcrnberger-Mayer, Jarrcttsville, MD), gluta- 
mate transporter GLT-1 (1:1,000), SODl (1:10,000), heat shock 
protein (HSP7D; 1:100; StrcssGen Biotechnologies, Victoria, BC, 
Canada); ubiquitm (1:1,500; Dako), and glial fibrillary acidic 
protein (1:50; Dako). 

Electrophysiological ftecordmg. Electromyography (EMG) and 
nerve conductions were performed by using an ADI (Green- 
wich, CT) Poweriab/8SP stimulator and BioAMP amplifier 
followed by computer assisted data analysis (chart 4 r o and 
SCQPfc ADI). Compound muscle action potentials were 
recorded by stimulating the sciatic nerve at the sciatic notch and 
recording from the foot EMG was performed by using a bipolar 
needle and sampling at 200 Hz. 



Results 

Multiple Transgenic Rat lines Express Mirfcsit S0D1 G9aA . We identi- 
fied three SODI^a founders that expressed mutant human 
SODl in blood fig. L4). A fourth (founder 46) showed no 
detectable SODl C9SA expression; however, subsequent immu- 
nohlots of whole blood from the Fl animate of this line did 
indeed Show low-level SODl G99A (not shown). These four 
founders were bred to the Fl generation to establish transgenic 
lines. 

Transgene transmission frequency to the Fl generation was 
greater than the expected 50% in lines 26, 46, and 51 and was 
determined to be the result of multiple transgene integration 
sites m each of these founders. Distinct transgene integrations 
can be resolved by using quantitative Taqman PCR if the number 
Of transgene copies differs at each chromosomal site. This was 
indeed the case for lines 26, 46, and 51. Taqman PCR data were 
used to track inheritance of distinct low- or high -copy transgene 
loci by Fl generation animals thereby allowing us to establish 
separate sublincages for each of these lines (Table 1). 

Development of Motor Neuroa Disease in S0Dt €fi ^ Transgenic Rats. 

$ODl G93A founder 26 developed motor neuron disease at 93 
days of age, whereas all other founders did not develop disease. 
Because of the multiple integration of me transgene, the Fl 
generation animals derived from founder 26 inherited either the 
high- or low-copy transgene locus or both (sec Table 1). Fl 
animals containing only the low-copy locus (L26L) did not 
develop motor neuron disease. Fl animals that inherited both 
loci from founder 26 (U26HL) developed motor neuron disease 
by 93 days of age, the same age as disease onset in the founder, 
Fl animals that inherited only the high-copy locus (L26H) 
developed motor neuron disease between 104-121 days of age. 
The apparent earlier onset in L26HL vs. L26H animals most 
likely was the result of slightly higher mutant SODl expression 
(Table 1 ), Because the single high-copy locus (L26H) in rate was 
sufficient to elicit motor neuron disease, we chose to breed this 
subline to the F2 and subsequent generations for further analysis. 

Mutant SODl Expression in SOW™ Transge*ic Rats. SODl G9lA 
expression in the spinal cord of L26H transgenic rats was 
determined to be **8-fold above endogenous SODl as assessed 
by immunoblot analysis of young presymptomatic animals (Fig. 
1C; Table 1). As expected, these levels exceeded other transgenic 
rat lines that did not go cm to develop motor neuron disease 
(Table 1). SODl OMA expression in JL26H rats was also evident 
across many brain regions as well as peripheral tissues (Fig. Iff), 
similar to that seen in described SODl transgenic mice (16)- By 
end stage, mutant SODl levels accumulated ~16-fold over 
endogenous, representing a further 2-fold increase in SODl 093 A 
compared with levels in young presymptomatic rats (6 weeks old) 
(Fig. 1C). Spinal cord SODl 093 * levels were directly compared 
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Fig. 1. Mutant SOoi exprewion and disease in 50Dl°W« transgenic rate, 
SOOl expression in blood from transgenic founders (A) j$ highest in founder 
number 26. L26H F1 generation rats exhibit SODI®** expression throughout 
ttie nervous system and peripheral tissues (85. SODI**** expressed at ~8-fold 
over endogenous in young (6 weeks) presymptomatic transgenic rat spinal 
cord increases to -*l6-feld by end-stage disease (16 weeks) (Q. Normal a$e- 
matched llttermate control animal (D) at -»l2g days compared with an end- 
stage transgenic rat showing signs of muscle wasting, paralysis of both hihd- 
llmbs end One forelimb Kaplan-Meyer survival curve (n = 25} generated 
from F2 generation transgenic rats depicting disease onset and survival. 



in end-stage L26H transgenic rats and the previously described 
G1H and Q^ L transgenic mice, which also express SODl Gft3A , 
We found that SOJ0l (3WA levels in end-stage G1H and GIL (15, 
21, 22) transgenic mice were 3- and 1.5-fbldj respectively, higher 
than levels attained in end-stajge L2GH transgenic rate (data not 
shown). 

Character Izatioa of Motor Ne*n>n Disease In SOOT 3 ™ L2SH Trans- 
genic Rats. A subset (n = 25) of F2 generation animals for t26H 
were observed closely for onset of disease symptoms, as well as 
progression to death. Onset of motor neuron disease was scored 
as the first observation of an abnormal gait or evidence of 




rig, 1, Muscle atrophy and denervation In SODl««A r5t j. Leg rnusde myo- 
ffbera from end-stage (age >l20 days) 5QP1 <593A rats were often seen as 
groups of atrophic angular fibers (b, arrows), compared with aged-matched 
COrtVOl rats (a?. Compound muscle action potential In nontransgeruc control 
foot muscle (c: 5.48 mV) was reduced intriefoot(d?43 mV) in presymptomatic 
rats arvd was almost unobtainable in enoVstagefoot (e; 0.71 mV) after supra- 
maximal stimulation (1 ms per division). Needle EMG of presymptomatic 
SOD1 CS3A rat (o) demonstrates a rare fibrillation potential recorded in the 
lumbosacral parasplnous muscles compared with age-matched wild-type con- 
trol rat (r). EMG of end-stage <>125 days age) 5QDl GraA rat (ft) revealed contin- 
uous fibrillation potentiate and positive Sharp waves (20 ms per division). 



hindlhnb weakness. Affected animals were tested daily for the 
ability to right themselves after being turned ou either side for 
a maximum of 30 sec; failure at this task was seen in end-stage 
animals and scored as "death" (see Fig. IE). All end-stage 
animals were killed. Righting reflex failure was coincident with 
complete paralysis of both hindlimbs and at least 1 forelimb. F2 
L26H transgenic rats had an average age of onset of motor 
neuron disease of 115 days. Onset typically appeared as hindlimb 
abnormal gait and progressed very quickly (1-2 days) to overt 
hindlimb paralysis, typically affecting one limb first. Within \-1 
days, the second hindlimb was involved, although ^nirnalq could 
still ambulate through the use of forepaws. Affected rats showed 
signs of weight loss, poor grooming, and porphyrin staining 
around the eyes, L26H transgenic rats typically reached end- 
stage disease very quickly, an average of 11 days after onset of 
symproms. All F2 generation L26H transgenic rats monitored 
for this study reached end-stage disease within 173 days after 
birth. 

Mpsrie Pathology and Impaired Function in SODI^A L2GH flats. Leg 

muscles (distal and proximal) from end-stage rats revealed 
obvious and frequent angular atrophic myofibers, most often in 
discrete clumps typical of neurogenic atrophy (Fig. 2b) 7 whereas 
muscles from wild-type littermate controls were normal (Fig. 
2fl). In parallel, electrophysiologic recordings from end-stage 
SODl GS * A rats (n - 5) exhibiting obvious hindhmb parar/sis or 
paresis revealed markedly reduced amplitude of compound 
motor action potentials (CMAP) in the mtrinsic foot muscles 
(Fig. 2e) 7 indicating motor neuron toss (compare 5.48 mV in 
wild-type animals to 0-71 mV in end-stage SODl 093 * rats), 
CMAP in presymptomatic animals was diminished only partially 
(Fig. 2d; n = 5) compared with Iittcrmate controls (Fig. 2c), 
Needle EMG Of age-matched wild-type rats demonstrated ab- 
sence of any spontaneous activity, compared with rare fibrilla- 
tion potentials hi paraspinal muscles from presymptomatic rats 
(Fig. 2 /and g). Continuous fibrillation potentials and positive 
sharp waves were evident in leg muscles from end-stage L.26H 
rats (Fig. 2ft). 

ImnMinohlrtodiMnkai Characterization of S0D1 €nA LzBh Transgenic 

Rat*. Analysis Of hematoxylin/eosin-stained sections of lumbar 
spinal cord from end-stage SODl 093 * ra ts (n = ID) revealed a 
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Prtj, ?„ Motor neuron and axon loss in SOD1 G93A rats. Ventralspmal cordgray 
matter reveals vacuolar generation in the neuropil of pre$vmptomatic 
SOD rats (c and d) and astroglial and 1 055 of motor neurons in end-stage 
rats (e and f) compared with age-matched wild-type rats (a and 0). Glial 
nodules, around remnants of degenerating motor neurons, were evident 
throughout the ventral gray waiter if, arrows). Ventral motor roots from an 
end-stage rat 0) wera atrophic compared wJth aged- matched control roots 
<$>. Closer inspection revealed active Ongoing degeneration Tn end-Stag* 
5001*"* ventral roots (/), whereas roots from presymptomatic rats showed 
little degeneration {h). Magnrfication; x4, g and /; x 10, a, c, and *; x4Q, t, tf, 
andf:xipp,nand/, 



douse gliosis with a complete loss of ventral large motor neurons 
(alpha-motor neurons) as shown in Fig. 3 e and /compared wHb 
similarly aged wild-typo rats (Fig. 3 a and fc) T Closer inspection 
demonstrated frequent "glial nodules" (Fig. 3/, arrows), repre- 
senting active degeneration and enguJfmcnt of neurons. Inspec- 
tion of ventral horn gray matter from lumbar spinal cord from 
presymptomatic rats { H *90-»l00 days of age) revealed a normal 
population of motor neurons but a profound vacuolar degener- 
ation of the neuropil (Fig. 3 c and d) s similar to that seen in 
end-stage SODl<^ A mice (15, 22). However, in the rat these 
vacuoles were transient, appearing at the time of active motor 
neuron loss but were nearly absent in the lumbar cord by 
end-stage disease (Fig. 3 e and/). Brainstem and cervical spinal 
cord of end-stage rats also revealed vacuolar and glial nodule 
changes in motor neurons (not shown), albeit these appeared 
later in these regions, again consistent with vacuolar presence 
preceding neuronal loss. In concert with the changes in gray 
matter, ventral roofs from end-stage SOD 1 093 A rats were atro- 
phic (Fig. 3i). On closer inspection (Fig. 3/) active degeneration 
of most axons was observed with macrophage infiltration and 
myelin ovoids. In contrast* analysis from presymptomatic 
SODl G93A rat ventral roots (n — 5) showed almost normal- 
appearing axons (Fig, 3fc), compared with age-matched controls 
(Fig- 3?), with rare (1-2 axons perroot) undergoing degeneration 
(Fig. 3h arrow). 

As was reported in earlier examples of SOD1 mutant* 
mediated disease in mice (10), onset of clinical disease was 
accompanied by aggregates of SOD1 throughout the rat ventral 
horn (Fig. 4b) and brain (not shown) especially in prominent 
focal deposits in which mutant SOD1 Immuriorcactrvity was 
frequently most robust at the perimeter. Similar pathology was 
not found in n on transgenic controls (Fig, 4a). These aggregates 
could be found in a few surviving motor neuron perikarya, axons, 
and surrounding glia. Aggregates were intensely stained with 
Abs to ubiquitin (Fig, 4g), consistent with disruption in protein 
clearance by the proteosome. Aggregates also contained endog- 
enous Hsc70 especially within surviving motor neuron cell 
bodies (Fig. 4c), suggesting mutant-dependent depletion of the 
intracellular protein folding chaperone pool. Aberrant accumu- 
lations of neurofilaments, reported in SODl^A-expressing 
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Fig. 4. Aberrant accumulations of oroteirts in SOD1 G93A rats. Accumulation 
in the neuropil of SOD1 ea3A rats (fa) compared with age-matched wild-type 
rats (a), HscTO (c) and ubiquitin <Ub) (g) were abnormally accumulated in the 
neurapN and cytoplasm of ventral gray neurons. Similarly, neurofilament (nfO 
aggregates were found in the soma of large motor neurons (d) and their 
axons, often In spheroid Structures in the neuropil and especially in the ventral 
root zone white matter <f) compared with dorsal white tracts <e). 



transgenic mice (29), were prominent abnormalities after disease 
onset, both in perikarya (Fig. 4d) and in distended axonal 
swellings [compare the neurofilament staining in transgenic 
axons (Fig. 4f) with that of the wild-type littermatc controls (Fig, 
4e)]. These accumulations were selective for axons within the 
ventral root exit zone and were not found in the dorsal ascending 
columns (not shown), 

EAAT2 Deficits in the Ventral Ham Spinal Cord of SOD1»" UZ6H Rats. 

EAAT2 is the predominant glutamate transporter in the central 
nervous system, normally expressed widely throughout the spinal 
gray matter (Fig. 5a) in astrocytes but not in motor neurons 
(arrows, Fig. 5o). Previous studies have documented a profound 
loss of the protein in sporadic and familial ALS (25, 28, 30). In 
prcsymptomatic SODl G5,3A rats, just before disease onset, motor 
neurons are still present (Fig. 5c, arrows) and ventral homs have 
not started to degenerate (Fig. 3A). At this time point, there is 
an obvious patchy loss Of EAAT2 mnnunoi^activity in the 
ventral horn (Fig. 5c). By end stage, there is a profound focal loss 
of EAAT2 immunoreactiviry despite a striking increase in the 
number Of astrocytes (Fig. 5 4 and e). These changes were 
mirrored by a quantitative loss of EAAT2 immunoreactivity 
measured from immunoblots of extracts from spinal cord, es- 
pecially in the ventral gray regions (Fig. 5/). Assays of glutamate 
transport also confirmed a nearly 50% loss of functional trans- 
port (data not shown). Astroglial reactivity, as revealed by glial 
fibrillary acidic protein immunc^taming, also showed activation 
before motor neuron degeneration, in prcsymptomatic Spinal 
cord ventral gray (Fig- 5h) compared with nontransgenic con- 
trols (Fig. 5$), followed by a more dramatic activation (Fig. 5 i 
and J) in end-stage tissue. 

Discussion 

We have generated transgenic Sprague-Dawley rats that express 
human mutant £ODl™ A at levels ^S-fold over endogenous 
SOD1 in the spinal cord of young presymptomatic rats. This level 
of expression was Sufficient to cause an ALS-tikc motor neuron 
disease in rats by 3-4 months of age. Additional transgenic lines 
expressing mutant SOD1 between 0.1- to ^6-fold Over endog- 
enous levels of SOD1 have not developed any signs of motor 
neuron disease by 1 year of age. Recapitulation of an ALS-like 
motor neuron disease in the transgenic rat using the G93A 
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RB. 5, Astroglial aFterations tn SOdi«"a ratfc The usual ubiquitous astroglial 
expression of the glutamate transporter EAAT2 (a and b), wrfOu riding motor 
neurons (arrows), was markedly altered in SODl^ 3 * rats with a patchy loss In the 
ventral hom in presymptomatfc rate fa 4nCf almost a complete loss of protein 
in end-stage ventral gray from SODI 683 * rats (rf end e>. This foss of EAAT2 (dUT) 
was paraNeJed in immunohfots from ventral flfay of presyrnptomatic and end- 
stage rats (f). In parallel, astroglial eqpresson of glial fibrillary acidic protein 
(GFAP) Increased Somewhat In presymptomatfc ventral gray <o), compared with 
age-martch^i wild-type control <g),and wa* marked ly increased In end-stage rats 
(Or espcdany around rare motor neuron profiles (/). 



mutant SQOl clearly depended on the ability to obtain high- 
level txansgene expression m the spinal cord as reported for the 
SODI^a ( 15 ) and sODl G37R (16) transgenic mice, 

1608 | www.pnas.org/ii5i/d9i/10.l073/pnas.032539299 



No oven motor neuron loss was evident in presymptomatic 
S0D1 093A transgenic rats between 3-4 months of age as deter- 
mined by both histological and electrophysiological observa- 
tions. However, we noted the appearance of vacuoles in motor 
neurons as well as gliosis preceding both motor neuron loss and 
clinical signs of disease in rats. The presence of vacuoles was 
transient, correlating with the time of active motor neuron loss. 
In the most affected regions vacuoles were nearly absent by 
end-stage disease. Progression to end-stage paralysis was rapid, 
with an average of 11 days after first observation of symptoms. 
This finding is in contrast to the slower progression of disease 
observed in $ODl°™ A transgenic mice (G1H and OIL) where 
disease duration approached 60-70 days (15, 21, 22) but instead 
was more similar to that reported for SODl Ga5R mice (17) whose 
disease duration was only 7-14 days. Mutant SOD1 levels in 
end-stage GIL and G1H transgenic mouse spinal cord (15, 21, 
22) were higher that) in SODl G93A L26H transgenic rats, and 
therefore the rapid progression of disease in the SODl°^ A 
transgenic rats seems not to be a function of expression levels but 
rather may be characteristic of a species difference in the 
presentation of clinical phenotype. The rapid decline of the 
SODl G93A rats coincided with substantial loss of spinal cord 
motor neurons as well as marked increases m gliosis and 
degeneration of muscle integrity and function. 

The astroglial gluramatc transporter EAAT2 is the primary 
means of maintaining low extracellular ghitamate levels. Loss of 
this protein induced by either pharmacological or molecular 
methods in vitro and in vivo results in increased extracellular 
glutamate, as measured by microdialysis and excitotoxic neuro- 
nal degeneration, including degeneration of motor neurons. 
Elevations of extracellular glutamate and loss of EAAT2 are 
Characteristic of at least 40% of sporadic patients With ALS, and 
similar changes have been observed in the mutant mouse models 
of the disease (17, 25, 31-33). Interestingly, a recent Study of a 
similar transgenic rat model, however, did not observe changes 
in cerebrospinal fluid (CSF) ghitamate (34), The reason for the 
difference between that rat model, the work in the current study, 
and previous human observations is not clear* However, a focal 
loss of EAAT2 would be expected to increase glutamate only 
locattr and therefore might not be detectable in the CSF. In 
addition, CSF ghitamate measurements are fraught with technical 
problems. 

The cause of EAAT2 loss is not known, but multiple studies 
demonstrate that astroglial changes can Occur early, before 
actual motor neuron degeneration (13, 17). However, loss of 
neurons can lead to glial responses that include transient down- 
regulation of EAAT2 expression (35 3 36), Yet, there is no loss 
Of EAAT2 in another motor neuron disease, Spinal muscular 
atrophy (33, 37). Previous reports have documented a loss of 
EAAT2 to -50% its normal level in SODl Gfl5R transgenic mice 
(17) by using whole spinal cord at end-stage disease. The current 
study provides a thorough evaluation of EAAT2 at a time point 
when motor neurons are intact histologically and physiologically, 
as revealed by EMG/nerve conduction Studies, At these "carry 1 ' 
timepoints, there was apatehy loss of EAAT2 expression around 
motor neurons in the ventral gray areas of the Spinal cord, 
suggesting that the loss of EAAT2 may contribute to motor 
neuron degeneration. Concomitant With decreased EAAT2 
expression was a marked increase in gliosis, and by end stage, 
where motor neuron loss is severe, EAAT2 was present at only 
5-10% of normal levels in the ventral horn. Importantly, the 
contribution of altered EAAT2 expression to neuronal death/ 
Injury was demonstrated by a recent study where EAAT2 
overexpression offered protection in SODl^ A ralceJ 
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We describe a transgenic rat model for AUS based on the 
SOD1 091 * mutation. The clinical and pathological changes dis- 
played resemble the "high expressing" SODl GWA mice first 
described by GumeyetaL (15) including a characteristic vacuolar 
degeneration of the neuropil, which seems to occur just before 
motor neuron degeneration and aggregates starnirtg with SOD! 
and neurofilament. Proteins, Hso70 and ubiquitin, involved in 
protein folding as well as degradation are also present in these 
aggregates in these transgenic rats. Notable differences between 
the rat and mouse models, however, include a more rapid 
progression of disease and the transient appearance of vacuoles 
in the transgenic rat. The rapid decline of the SODl G93A rats to 
end stage could account for the disappearance of vacuoles in 
sections of the spinal cord that display severe motor neuron loss. 
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Huntington's disease (HD> is a late manifesting neurodegenerative disorder in humans caused by an 
expansion of a CAG trinucleotide repeat of more than 39 units in a gene of unknown function. Several mouse 
models have been reported which show rapid progression of a phenotype leading to death within 3-5 months 
(transgenic models) resembling the rare Juvenile course of HD (Westphal variant) or which do not present 
with any symptoms (knock-in mice). Owing to the small size of the brain, mice are net suitable for repetitive In 
vivo imaging studies. Also, rapid progression of the disease in the transgenic models limits their usefulness 
for neurotransplantation. We therefore generated a rat model transgenic of HD, which carries a truncated 
huntlngtln cDNA fragment with 51 CAG repeats under control of the native rat huntingtin promoter. This is the 
first transgenic rat model of a neurodegenerative disorder of the brain. These rats exhibit adult-onset 
neurological phenqtypes with reduced anxiety, cognitive impairments, and slowly progressive motor 
dysfunction as well as typical histo pat ho logical alterations In the form of neuronal nuclear inclusions in the 
brain. As in HD patients, In vivo Imaging demonstrates striatal shrinkage In magnetic resonance images and a 
reduced brain glucose metabolism In high-resolution fiuor-deoxy-glucose positron emission tomography 
studies. TDIs model allows longitudinal In vivo Imaging studies and is therefore ideally suited for the 
evaluation of novel therapeutic approaches such as neurotransplantation. 



INTRODUCTION within the coding region oftheHDgeneCITl 5) (1). The imitation 

leads to a progressive degeneration of neurons primarily in 
Huntington*s disease (HD) is an autosome dominant disorder striatum and cerebral cortex. Clinically, HD is characterized by 
caused by an expanded and unstable CAG trinucleotide repeat movement abnormalities, cognitive impairments, and emotional 
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disturbances (2). In general, movement disturbances begin with 
chorea. Depressed mood and more subtle deficits apparent in 
neuropsychological tests may precede motor symptoms by years. 
The disease progresses relentlessly until death within 15-20 
years. No eficctive treatment to influence the onset or the 
progression is presently available. 

Many attempts have been made to generate animal models of 
HD. Exci to toxin models replicate many of the histological and 
neurochemical features as well as some of the motor and 
cognitive signs of HD (3— 5)> but ncurodcgcnciration is not truly 
progressive. Therefore, their usefulness for the evaluation of 
treatment effects is limited. 

Transgenic animal models of HD (6-11) provide new ways 
of studying the neuropathologies! mechanisms underlying 
HD. In particular the R6/2 transgenic mouse line, which 
expresses the first exon of the human HD gene carrying 141- 
157 CAG repeat expansions (6), develops a number of key 
features of HD, including progressive motor deterioration 
(12,13), appearance of neuronal intranuclear inclusions (14), 
discriminative learning impairments (15), and altered emo- 
tionality (1 6). However, R6/2 mice express very large 
numbers of CAG repeats that aie only found in the juvenile 
type of HD. A rapid disease progression associated with 
diabetes in R6/2 mice (13) is not typical for the adult-type 
HD and may complicate the assessment of potential 
therapeutic approaches. Although HD transgenic mice provide 
important insights into the molecular basis of HD, mere is 
Still a need for animal models which resemble the common 
adult type of disease and which are more suitable for 
repetitive in vivo imaging. These rapidly emerging techniques 
ofler the opportunity to compare directly the pathological 
alterations of the human condition with the corresponding 
animal model in longitudinal studies (17). 

In this report, we describe the first transgenic rat model 
bearing a human HD mutation with a high-end adult onset 
allele of 51 CAG repeats that exhibits progressive neurological, 
neuropathologies] and neurochemical phenotypes closely 
resembling the common late manifesting and slowly progres- 
sing type of disease. We demonstrate that HD transgenic rats 
are well suited for complex behavioral studies and the 
evaluation of in vivo progression markers using high-resolution 
PET and MRL 
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Generation of the BCD transgenic rat model 

A 1962 bp rat HD cDNA fragment (18) carrying expansions of 
51 CAG repeats under the control of 885 bp of the endogenous 
rat HD promoter (19) was used for microinjection (Fig. 1A). 
Two founders were obtained and transgenic lines established 
Of these, we followed up line 2762 for more than 2 years and 
found the CAG repeat length remaining stable in more than 147 
meioses (data not shown). The mutant amino terminal portion 
of huntangtin is expressed in the brain as shown by western blot 
analysis (Fig. IB), in particular in the frontal and temporal 
cortex, the hippocampus^ the basal ganglia,, and the mesence- 
phalon, but at a much lower level in the cerebellum or the 
spinal cord (Fig. 1C). 



Figure 1. Iransgcnc construct and humlngtin expression in transgenic rate. (A) 
The first 154 bp of a partial himrinffrn cDNA spanning ] 962 bp cfthe N-twminaJ 

of a HD paticot The c0NA 13 driven "by a 885 bp fragment of the rax HD pro mo tor 
(position -500 to -15 op) (19). A 200 bp fragment containing the SV40 polyftde- 
nylation signal was finally added downstream of the cDNA resulting in RHD/ 
PromSIA, (B) Western Wot analysis of brain tissue of taroBcraiic rat line 2771 
ana 2762 using polyclonal and-huntingtin antibody 675 demonstrates a 7SkDa 
product representing tbc expression of fbc uansgene although at a Inwer level than 
toe <?n dozens protein, Homozygoticxate (■+/+) express about double me amount 
of mc tnmsgene protein as hemteygoqc 1'UKS ). (Q Western blot analysis of 
tissue from current brain areas of transgenic rat line 2762 at the age of 6 months, 
demonstrating a 75 kDa product representing the expression of mc transgene m the 
frontal cortex, tbc temporal cortex, hippocampus, basal ganglia and mesencepha- 
lon, hut not in the cerebellum or the spinal cord. However, overexposure of tbe same 
western blot clearly demonstrates that the tronsgenc is oho expressed in the cerebel- 
lum and tbc spinal cord though at a much lower level (data not shown). 
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Slow progressive phenotypes with emotional, 
cognitive and motor dysfunction 

At birth we found transgenic tats and wild-type littermates 
phenotypicalry indistinguishable. Transgenic rats of both sexes 
ant fertile without any sign of atrophy of the sexual organs. We 
observed a lower body weight gain in transgenic rats that was 
slowly progressive with the animals being about 20% lighter at 
me age of 24 months (Fig. 2A). At this age, transgenic rats 
commonly died after a 2 week period of rapid weight loss, 
which is associated with emaciation and muscular atrophy 
(Fig- 2B). Plasma ghicose levels were always normal in routine 
screening (data not shown). 

Transgenic ant ma Is often showed opistbotonus-like move- 
ments of the head. No resting tremor, ataxia, clasping, 
vocalizations, dyskinesia or seizures were observed. Except 
for occasional dyskinetic movements of the head, overt 
behavioral abnormalities were only found on dedicated 
behavioral testing. 

At the age of 2 months transgenic rats developed a reduction 
of anxiety-like behavior in the elevated plus maze test (Fig. 2C), 
which is similar to the findings in R6/2 transgenic mice (16). At 
the age of 10 months transgenic rats showed cognitive decline 
in a spatial learning task for testing working memory in the 
radial maze (Fig. 2D and E). At the age of 5 months we had no 
indication of motor dysfunction in the animals (Fig. 2F), while 
at the age of 10 and 15 months progressive impairment? of 
hind- and forelimb coordination and balance in the accelerod 
test were found (Fig. 2G and H). Thus, as in HD patients, 
emotional and cognitive impairments preceded progressive 
motor deterioration. 

Accumulation of hnntingtin aggregates and nuclear 
inclusion* in striatal neurons 

We examined whether mutant huntingidn forms aggregates and 
inclusions in the brain of 1 8-month-old rats using EM48, a 
rabbit antibody selective for mutant hnntingtm (20,21). Most of 
the EM48 mununoreactive products appeared as punctuate 
labeling in the striatum, especially in the ventral region near the 
lateral ventricles and in the caudal part (Fig. 3B). Occasionally 
EM48 labeled aggregates were observed in the cortex. Other 
regions including hippocampus and cerebellum showed very 
weak or no EM48 label. In wild-type animals no EM48 labeled 
aggregates or ptincta were found (Fig. 3A). 

Two types of EM48 labeling, neuropil aggregates and nuclear 
inclusions were observed As in other HD m^mai models 
(1 1 ,22) and in HD patient brains (20) some neuropil aggregates 
were arranged in linear arrays and most of mem were scattered 
(Fig. 3C). Single nuclear inclusions were mainly observed in 
the striatum (Fig. 3D), resembling other HD mouse models 
(14,21). Since the striatal projection neurons terminate their 
axons in the lateral globus pallidus (LGP), we also examined 
the caudal region of the striatum. Nuclear staining and neuropil 
aggregates were common in the striatum. In the LGP, however, 
most KM48 labeling existed as neuropil aggregates. 

To examine at what age mutant huntmgtm forms aggregates 
and inclusions in the ventral region of the striatum, we 
additionally screened brains of l- 3 6- a 12- and 24-month-old 
rats for EM48 mununoreactive products (Fig. 3E-H) T At the 



ages of 12 (Fig. 3G), 1 8 (Fig, 3A-D), and 24 months punctuate 
labeling was evident, which was most pronounced at the age of 
24 months. No aggregates or inclusions were found in the brain 
of 1- and 6-month-old rats. 

Postmortem concentrations of tryptophan and 
biogenic amines 

Since altered tryptophan and doparnine metabolism is linked to 
HD, we examined neurochemical alterations in the transgenic 
HD rats using a highly sensitive HPLC method (23). Striatal 
doparnine levels were decreased only about 20% m hetero- 
zygotic rats whereas in homozygotic rats a reduction of nearly 
80% was found (Fig. 4A). The levels of dopamine and DOPAC 
in the parietal cortex of horaozygotic animals were not 
significantly changed (Fig. 4B, D and E). Tryptophan 
concentrations were decreased 2-fold in striatum (Fig. 4E) 4 
but not significantly different in parietal cortex (Fig. 4F), 
Interestingly, the levels of xanturenic acid were nearly depleted 
in the striatum of homozygoric transgenic rats (Fig. 4G) and 
undetectable in the parietal cortex (Fig, 4H). In contrast, in 
heterpzygotes levels of xanturenic acid were elevated in the 
parietal cortex (Fig. 4H), but unchanged in the striatum 
(Fig. 4G). No significant changes in other neurotransmitter 
levels were found. 

Focal lesions In the striatum, enlarged lateral ventricles, 
and reduced brain glucose metabolism 

To examine whether transgenic animals display neuropatholo- 
gical signs detectable by magnetic resonance (MR) imaging, 
we performed MR investigations on 8-month-old homozygoric 
HD rats. MR scans revealed enlarged lateral ventricles (Fig. 5C 
and D) and focal lesions in the striatum (Fig. 5F). 

Since clinical studies have consistently revealed reductions in 
striatal glucose metabolism, we studied the local cerebral 
metabolic rate of glucose QCMRotc) in transgenic rats using 
[ 1? F]FDG (fluur-deoxy-glucose) and a high-resolution small- 
animal PET (positron emission tomography). PET studies were 
accompanied by ex vivo [ I8 F]FDG measurements in order to 
test their reliability 

Harderian glands and different parts of the brain, such as 
olfactory bulb and caudato-putamen, were clearly distinguish- 
able (Fig. 6). Individually co-registered MR images allowed a 
precise delineation of the whole brain as region of interest 
(ROI), as indicated by the red line (Fig, 6 A and E). The defmed 
ROI was measured in the co-registered PET image (Fig. 6B-D, 
F-H), Mean lCMRoic values, as calculated from animal pet 
data of control aiiirnals, were 54.98 :£15J3 [uraol/ 
(lOOg x min)] for the whole brain. Mean 1CMRqi c values of 
hetero- and homozygoric animals were lower than control 
values (see legend of Fig, 6). Metabolic abnormalities of 
homozygotic ammals were significantly different from controls 
(P<0.05). 

After completion of the PET scannings we subsequently 
acquired ICMRcic values ex vivo using [ l8 F]FDG auto- 
radiography (Fi g. 6J and K). Similar to the in vivo situation 
deteimmed by [ B F]FDOPET ? mean ex vivo ICMRoic values of 
homozygotic animals were significantly lower than control 
values (P < 0,05), A statistical comparison of autoradiographic 
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Figure 2. Growth, survival, and behavioral pheuo typing. Growth chart representing absolute body weight measured once a week of male wild-type (— /— ; gray 
round symbols) and HD transgenic (+/— dark squares and +/-K black triangles) rats from 1 to 24 months of age (A), Symbols indicate mcansiSEM. A sig- 
nificant effect of genorype (P^ OrOOl) And a Significant genotype X weight-gain interaction (P < 0.0001) indicate ft progressive decline in body weight gain in HD 
transgenics. (B) Cumulative survival of male wild-type (—/—', douse* line) and HD transgenic (+/— , mixed dots/lines and +/+, line) rate from 1 to 24 months of age 
(end of study) using Kaplan-Meier estimator Log-rank teat revealed aP< 0,05. (C) Percentage of time spent on tbe open arms of the elevated plus maze. 
Transgenic rats (+/— , hatched columns and +A-K black columns) spent more time (**P < 0-001 : **"*P<. 0.000 1) on fhe open arms, (D, E) Radial maze behavior. 
During exploration of the radial maze, transgenic rats showed no major differences in preference for certain angles when choosing arms (D) suggesting thai the 
animals have general motor, cognitive and sensory abilities sufficient to master mis task. Activity (total of arm visits and total of time in arms) was not significantly 
changed (data not shown). Radial maze reinforced alternation demonstrated an increased number of arm visits required to collect all food pcTJeta. The increased 
number of working memory (WM) errors (E) indicates that the transgene affected the ability to retain and manipulate mnemonic information to guide ongoing 
behavior (+P < 0.01; **P <: 0.001). Bars indicate means rb SEM of each measurement across the trials, (F— fl) Balance and motor coordination on the accelerating 
rod. The means ± SEM of the maxima! speed (rpm) and the duration of balance (data not shown) were recorded. At the age of 5 months HD transgenic rats were 
not siguiflcarrtry impaired in their ability to stay on the rotating rod (F> At tbe age of 10 and 15 months HD transgenic rats exhibit difficulty and a progressive 
decline in performance on the acceLcrod (G—H). Asterisks indicate significant differences between wild-type (-7— ) control and homo- as well as rKtexuzygotia HD 
transgenic rats (*/> < 0.01; **P <; 0.001). 



and animal PET data indicated that 1CMR<h c values were 
significantly similar (P<0.05). 



DISCUSSION 

In this report we describe the first transgenic rat model for 
Huntington's disease* which displays symptoms similar to the 
most frequent late-onset form of HD, It should be emphasized 
mat these transgenic rats represent the first animal model of a 
human neurodegenerative disorder of tbe brain per se and that 
these animals express a high-end adult-onset HD allele, which 
is associated with a slow disease progression and pathology 
restricted to the striatum. Other symptomatic transgenic mice, 
however, express very large repeats that arc only found in 
juvenile HD patients. Thus, these HDtg rats are especially 
useful for studying pathological changes that may be 
commonly present in the majority of adult HD patients, 



making this rat mode! more valuable than other mouse models 
in evaluating novel therapeutics on HD, 

Transgenic rats develop slowly progressive phenotypes with 
emotional, cognitive, and motor deteriorations. The emotional 
disturbance is characterised by a reduction of anxiety, which 
resembles similar observations in R6/2 HD transgenic mice 
(16). Cognitive decline is also a feature of HD (24). Early in the 
course of HD, patients fitqucntty show rmpairments of spatial 
working memory (25), and comparable deficits are also found 
in R672 mice (15,26) as well as in our HD transgenic rats. 
These data suggest a common underlying neuropathologies! 
mechanism in HD and corresponding animal models. 

Neuropathological examination revealed nuclear inclusions 
and neuropil aggregates. EM4S labeled aggregates are mainly 
found in the striatum of transgenic rats at the age of one year 
and older. £M48 labeling shows a distribution pattern similar to 
that in the human condition (20). Similar results were 
previously reported in HD knock-in mice expressing full-length 
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Figure 3. EM48 irnmurwsUinlrxg of brains of wild-type and HD transgenic rats. 
(A, B) Low magnification of micrographs of wiloMypc (A) and HD (B) rat 
brains. Mote that EM4S nruTiunorcactivc product in particnlarty enriched in 
the ventral part of striatum (Str) near the veotridc (arrow) in HD rat bnain. 
Ctx, cortex. Scale bar, 50 j ]rn (C) In the caudate part of the striatum of HD rats, 
many nuclear aggregates und small neuropil aggregates are evident. Neuropil 
aggregates (arrows) are also present in the lateral globus pallidas (LGP), 
Scale bar, 25 urn. (D) High magnification of micrograph showing that both 
EM48 labeled nuclear inclusion (arrowheads) and small neuropil aggregates 
(arrow?) are present in the striatum of HD rat brain, (E-H) Corresponding 
micrographs of coronal section? at the level of the bregma of 1 -month (E), 
6-month (F), 12r-rnonrh (GX 24-month old transgenic rats. Scale bar, 10 urn. 



mutant huntingtin under the endogenous mouse HD promoter 
(21,27). A remarkable observation in ueuroc^emistry was 
mat xanmrenic acid was nearly completely depleted m the 
striatum and the parietal cortex. The levels of xanturenic acid 
were higher in the less afflicted heterozygotes, perhaps 
reflecting a neurochemical defense mechanism against the 
excitotoxicity of the overactive mdoleamine (2,3>-dioxygenase 
pathway (28). Similar to HD patients, the levels of tryptophan 
were decreased in the striatum of homozygotes. Decreased DA 
and normal DORA.C levels are indicative of increased DA 
turnover. Decreased levels of tryptophan may be related to an 
increased formation of quinolinic acid, a neuroexitant molecule 
with neurotoxic properties (5). These findings support the 
hypothesis that both increased formation of qumolinic acid (28) 
and decreased production of neuroprotective metabolites from 
tryptophan (29) may be relevant to the pathogenesis of HD. 

An important feature of the presented HD rat model is its 
suitability for In vivo metabolic and structural imaging, which 
cannot yet be achieved with transgenic mice. MR scanning 
demonstrated an enlargement of the lateral ventricles and focal 
signal abnormalities in the striatum of HD transgenic animals, 
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Figure 4, Regional alterations in tryptophan metabolism in HD sansgenic rats. 
The levels of dopamine (A. B), DOPAC (C, J>% tryptophan (E, F) and xanmra- 
nic odd (G, H) in Striatum (A, C, E, G) or parietal cortex. (B, D, F, H) in wild- 
rype (-*■/—) or hom<K4-/+) and heterozygoijc (+/— ) transgenic rats expressing 
human HD mutation at the age of IS months. Asterisks indicate significant dif- 
ferences from control rate (*/ > <0.05. **P<: 0.001, ***/'-<O.0O0i). 



although quantitative assessment of striatal neurons revealed no 
significant cell loss. This indicates that striatal atrophy depicted 
by MR imaging is rather a consequence of shrinkage than 
neuronal death. In high-resolution animal PET we found a 
significant reduction of brahl glucose metabolism in 2-year-old 
homozygotic HD rats. In late stages of human HD, clinical PET 
studies consistently revealed reduced 1CMR<^ C in the striatum 
(30,31). Thus, this report provides evidence that the novel HD 
transgenic rat model does, closely resemble the human 
pathological condition. It is suited for non-invasive in vivo 
investigations of brain metabolism and most probably of further 
in vivo parameter? (e.g. receptor density, enzyme activity). 
Brain atrophy and extracranial tracer accumulation, however, 
necessitate the application of high-resolution tomographs and a 
careful evaluation of partial volume and spill over effects. 

We report the successful development of a transgenic rat 
model of HD, which expresses a high-end adult onset HD allele 
with 51 CAG repeats and which exhibits a high degree of 
similarity to the most frequent adult type of the disease, thereby 
penmtting in vivo monitoring of individual disease progression 
by high-resolution imaging (PET and MRI). For the first time it 
is now possible to follow up disease progression in longitudinal 
in vivo studies and to monitor the effects of long-term 
treatments, microsurgery, neuronal cell transplantation, or 
antisense approaches on discourse of experimental HD. 



MATERIALS AND METHODS 

Generation of transgenic rats 

To generate the transgene construct, PCR was performed using 
DNA from a HD patient (19/51 CAOs) with Primer Hu 4 
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Figure 5. MR scanoiog of brains of wild-type and HD vaosgeriic rats. (A-D) 
MR scariK of lateral ventricles in coronal (A) and horizontal (B) projection of 
wild-type (A, B) and HD (C, D) rat brain at the age of 8 months. MR scans 
of the striatum of a wild-type (£) and an HD transgenic (F) rat brain. Note 
[be enlargement of (he lateral ventricles (arrows) and the focal lesions in the 
striatum (arrows), 



(ATGGCGACCCTGGAAAAGCTGATGAA) and Hu3-510 
(GGGCGCCTGAGGCTGAGGCAGC). This PCR product 
was subsequently digested with Eco81L The first 154 
nucleotides of the cDNA RHD10 containing nt 1-1962 of 
the rat HD-gene (18) were removed by restriction of the clone 
with Ecq¥X and Eco&U. This fragment was replaced by the 
PCR product. Subsequently, a 885 bp rat HD promoter 
fragment from position -900 to -15 (19) was ligated upstream 
of die cDNA and a 200 bp fragment containing the SV40 
poryadenylation signal was added downstream of the cDNA 
resulting in RHD/PronrSIA. The insert was excised with Xbal 
and SspJ out of the cloning vector and rmcromjected into 
oocyte donors of Sprague-Dawley (SD) rata (32 s 33). Tail DNA 
was extracted from each of the offspring and Southern blots of 
EcoRJ digested DNA were performed to screen for founders. 

For western blot analysis, frozen brain halves and dissected 
brain areas were homogenized and protein extracted. Protein 
extracts were subjected to SDS-PAGE and blotted electro- 
phoreticalry onto ImmobilOn-P membranes. Detection pf 
huntingtin protein was performed basically as described (34) 
using the polyclonal anti-huntingtin antibody 675. 

Behavioral pheno typing of the HD transgenic rat line 

The considerations for behavioral phenotyping of transgenic 
and knockout mice (35) were adapted with specific modifica- 
tions for testing rats. All procedures were approved by the 
Government of Lower Saxony in Hannover, Germany, and 
performed in compliance with international animal welfare 
standards , The elevated plus maze (TSE-Systems, Bad 
Homburg, Germany) was equipped with light beam sensors 
and had two open arms (50 X 10 cm) and two enclosed arms of 
the same Size. The experiment was conducted with 2-month- 
old rats as previously described (36). An increase of the time 



Figure 6, Studies with [ 18 F]FDG and high-resolution bid all-animal PET. 
Representative images with [ FJFDO and high-resolution smull-aruma] PET 
in horizontal (B-D) and coronal (F-B) planes along with individual MR 
images (A, K) and ex vivo autaradiographfl (J 7 K). Individual MR images (A, 
E) of a control an irnnl arc co-registered with respective [ 1S F]FDG-FET images 
(B, F). Planes Ore cutting the caudato-putamcn level of the brain. Representative 
sections Of ex ViVC autoradiography (J, K) are taken from identical animals as in 
[ Iff F]FDG-PET (B, F; D, H), The rat oram is defined within foe [ S *F]FDG-PET 
□n the basis of mdividiialJy co-rcgistcrcd MR images 35 indicated by The red 
tine. Local cerebral rates of glucose metabolism (lCMRohi) aic absohrtcry 
quantified (see color and black/white bars). The high accumulation of activity 
in caudato-pntnmcfi & clearly visible in [ "FIFDG-PET (F, G, H) and ex vivo 
autoradiography (J, K). Hamozygotic animal? cxHjbjt sigrufic&ftlly (P < 0.05) 
tower lCMR<3h» values compared with controls, both in [ 1K F]FDQr^PBT 
[34,54 ± 1 8,52 umol/(100 g x rtrin) versus 54.98 ± 15.53 \unoV(100 g x min)l 
and in ex vivo anmradJography [43,$4±6.77umoI/(I00g><:niin) versus 
63.02± 8.24 umal/<100 g x mm)]. 



spent in the open arms is interpreted as an anxiolytio-like 
response. An automated sensor-equipped eight-arm radial maze 
(TSE) was used to measure learning and memory in an 
experimental design testing exploring behavior and working 
memory (WM) errors in allocentric orientation (37). An 
accelerating rptarod for rats (TSE) was used to measure fore- 
and hind-limb motor coordination and balance. Training 
consisted of three trials per day on four consecutive days. 
The duration of each trial was 5 min on accelerating mode of 
the apparatus. The maxima! speed level and the mean latency to 
fall off" the rotarod were recorded on three consecutive tests. 
Data were subjected to one- or two-way ANOVA with one 
between-subject factor (genotype) and with repeated measure- 
ments on one or more factors depending on the test usedL The 
PLSD test was used for post hoc comparisons. Cumulative 
survival was calculated by means of Kaplan-Meier analysis. A 
critical value for significance of P < 0.05 was used throughout 
the study. 



Immnnohistology and light microscopic examination 

Brains of HD transgenic rats and controls at the age of 1, 6, 12, 
18 and 24 months were perfused inrxacardially with PBS 
followed by paraformaldehyde and postfixed, Free-floating 
Sections were pie-blocked in normal goat serum, TritOn-X and 
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avidin, and incubated with EM48 antibody (1:400 dilution) at 
4*0 for 24 h (20,21). The EM48 imraunoreacdvc product was 
visualized with the avidin-biotin complex kit (Vector ABC 
Elite, Burlingame, CA, USA), 

Analysis of neurotransmitters from post-mortem 
tissue samples 

Tryptophan and its kynureninc, catechol- and indoleamine 
metabolites were measured by electrochemical HPLC, as 
described previously (23). Briefly, striatum and parietal cortex 
of 18-raonth-old transgenic HD rats were dissected, weighed 
and sonicated in perchloric acid The homogenate was 
centrifuged and 20 ul of supernatant was injected into a 
HPLC system (ESA mode) S6Q0 CoulArray module, 
Chelmsford, MA* USA) with two coulometric arraycell 
modules, each with four working electrodes. The chromato- 
graphic separation was achieved on an BSA MEM SO rever^ed- 
phase C 1B analytical column with a Hypersil pre-column, 

MR scanning 

Rats were anesthetized with 2% isoflurane arid fixed in a 
stereotaxic frame. MRI was performed on a 4,7 T Bruker 
Biospec scanner with, a free-bore of 20 cm equipped with an 
actively RF-decoupled coil system, A whole-body birdcage 
resonator enabled homogeneous excitation, and a 3 cm surface 
coil was used as receiver, ^-weighted spin echo images were 
acquired using a rapid acquisition relaxation enhanced (RARE) 
sequence (38). Eleven axial and seven coronal slices were 
measured (slice thickness: 1.5 mm axial; 13 mm coronal; field 
of view, 3-2 x 3.2cm; matrix, 256 x 256; TRITE 3Q<XVJ9ira 
six averages). 

PET studies 

PET imaging was performed on a dedicated high-resolution 
Srnall-animal PET scanner ('TierPET') as previously described 
(39) on 24-month-old homozygotic (+/+; n = 6) and hetero- 
zygotic animals (+/— ; w=s7), as well as age-matched controls 
{—/—l w = 6). Reconstructed image resolution was 2 r l mm, 
which is homogeneously rnaintained throughout the entire field 
of view, A precise anatomical identification of rat brain regions 
was achieved by co-registration of magnetic resonance (MRI; 
Siemens Magnetom, 1.5 T, equipped with a dedicated small 
limb coil) and PET images. Animals received an injection of 
0.3 ml [ 18 F]FDG (lmCi/ml 5 solved in NaCl 0.9%) under 
isoflurane sedation. After 30min animals were anesthetized 
with ketarnine/xylazine and glucose concentrations and input 
function were detected by serial blood samples. After a 60 mill 
PET scan brains were removed and immediately frozen. 
Cryostat sections (20 fim) were exposed to a phosphor imaging 
plate (BAS-SR 2025, Fuji, Germany) together with calibrated 
fluorine- 18 brain paste standards. Imaging plates were scanned 
with a fcdgfc-performance imaging plate reader (BAS5000 
BioImageAiialyzer, Fuji* Germany; spatial resolution, 50 um) T 
Local cerebral metabolic rate of glucose (rCMR^) was 
calculated on the basis of the operational equation used in 
2DG autoradiography studies (40) with modified rate and 
lumped constants to account for the difference in kinetic 



characteristics between FDG and 2DG. The following 
constants (41) were used: fr 7 = 0-30; ^=0.40; ^=0.068; 
lumped constant, LC = 0.60. Similarity of ICMRoic as 
determined by FDG-PET and ex vivo autoradiography was 
analyzed by linear regression analysis. 
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CAG/pcrfyglutamine expansion has bean shown to form the molecular basis of an increasing number of inherited 
neurodegenerative diseases. The mutation Is likely to act by a dominant gain of function but the mechanism by which 
it leads to neuronal dysfunction and cell death Is unknown. The proteins harbouring these poJyglutamine tracts are 
unrelated and without exception are widely expressed with extensively overlapping expression patterns. The factors 
governing the cei specific nature of the neurodeg^neratton have yet to be understood. Upon a certain size threshold, 
expanded CAG repeats become unstable on transmission and a modest degree of somatic mosaicism is a p pa r en t 
Similarly, the molecular basis of the Instability and tts tissue spectflcfty has yet to be unravelled. Recent reports 
describing the first mouse models of CAG^xriyglutamine disorders indicate that it will be possible to model both the 
pathogenic mechanism and the CAG repeat Instability In the mouse. This has great potential and promise for 
uncovering the molecular basis of these diseases and developing therapeutic Interventions. 



INTRODUCTION 

Huntington's disease (HD) (i) is one of an increasing cumber of 
neurwtefiereranve disorders caused by a CAG/j^lyglutamiiie 
(polygln) repeat expansion, including spinal and bulbar muscular 
airophy (SBMA) (2), g^ratorubral palfidoluysian anophy 
(DRFLA) (3,4) and spinocerebellar ataxia (SCA) types 1 (5), 2 
(6—8), 3 (9) and 6 (10). The inheritance patterns are autosomal 
dominant (with the exception of X-Enfoed SBMA) and in each case, 
the proteins can tolerate a large variation in the size of the polygln 
tracts in the normal range but upon a certain size (-37-40 
grocamines) these tracts become pathogenic. It is likely [hat the novel 
molecular pathways initiated by this mutation have a common basts 
(except possibly in the case of SCA6 in which the pathogenic 
threshold is smaller). The proteins harbouring the polygln sketches 
arc mostly novel and otherwise unrelated In all cases the proteins 
are widely or ubiquitously expressed, but despite extensively 
overlapping expression patterns, the neuronal cell death is relatively 
specific and can differ markedly (reviewed in 1 1 ). 

The molecular events by which a polygln expansion causes cell 
death remain to be unravelled (reviewed in 1 2). These mutatians 
are likely to act by a dominant gain of function, this mechanism 
being supported by the identification of the 1 CZ antibody which 
specifically detects polygln expansions, suggestive of a 
conformational change at a certain size threshold (13). In 
addition, the factors which convey the specific and differing 
patterns of cell death between these diseases are not under$tood_ 
Possible mechanisms include differences in expression levels, 
subcellular localisation of the mutated protein Of cell Specific 
subcellular interactions. A number of proteins have now been 
reported to interact with nnnringtin which induce HAP 1 (14), 
HIP- 1 (15), a Specific ubiquitm-conju^ting enzyme (16) and 
OAPDH 0 7). It is yet to be established whether any of these 
proteins play a role in the pathogenic mechanism, Huntmgtin has 
also been shown to be specifically cleaved by apopain, a cysteine 



protease with a key role in the proteolytic events leading to 
apo ptosis (18), Stmilariy, it is not clear if this participates in the 
chain of events leading to neurodegenerarion. 

Expanded triplet repeats arc invariably unstable when inherited 
from one generation to the next and they generally show varying 
degrees of somatic mosaicism. The intergenerational instability 
forms the molecular basis of anticipation; the observation that the 
age of onset of a disease decreases and/or the severity increases 
as the gene is passed from one generation to the nexL Repeat 
instability on transmission has been described in all of the CAG 
repeat diseases and, in general, repeats tend to be more unstable 
on paternal transmission. This may present, as larger increases on 
paternal inheritance as in HD (1 9) (reflected in the paternal sex 
bias to the anticipation) or as a tendency to increase on male and 
decrease on female transmission as in SCA1 (20). A relatively 
modest degree Of Somatic repeat instability has been identified in 
HD, DRFLA, SCaI and MID. in general, expansions have been 
identified in regions of the CNS, with the exception of the 
cerebellum which presents a smaller repeat relative to the other 
brain regions tested (2 1-26). Of non-CNS tissues, instability has 
consistently been reported in Liver and kidney (21,24-26) and 
also in muscle, lung, testis (21), leukocytes (23) and colon 
(24,26). Studies Of DRFLA patients also identified a significant 
Correlation between the range of the expanded allele and the age 
at death of the patient fattier man with die onset of disease (25). 
The molecular events governing triplet repeat instability are not 
understood and possible mechanisms must address both a CAG 
repeat size threshold and cell specificity. 

TRANSGENIC MODELLING OF HUNTINGTON'S 
DISEASE 

It has been proposed for many years that the HD mutation most 
probably acts through a dominant gain of inaction. Analysis of 
mice arising from the first transgenic models of HD, SCAI and 
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MJD in addition to gene targeted knockouts of the mouse HD 
gene (Hdh) supports this hypothesis. 

Knockouts of fbe mouse Shd gene 

Three research groups have independently generated knockouts 
of the mouse HD gene (Hdh) (27-29)- In all cases the nullizygous 
phenotype was embryonic lethal, clearly demonstrating that the 
HD gene plays ail important role in development In two of these 
Studies, heterozygous mice expressing only one copy of Hdh were 
phcnotypically normal (28,29). ha contrast, Nasir et al, (27) 
reported that heterozy gates showed increased motor activity and 
cognitive deficits with a significant neuronal loss in the sub- 
thalamic nucleus. To explain this discrepancy, it has been 
suggested that the targeted allele (targeted to replace exon 5) may 
allow the production of a truncated protein which could 
conceivably cause a dominant effect in the heterozygous mice, 
generating a phenotype. Together; these studies demonstrate that 
HD is not caused by baplcHnsufficiency (Loss of function of one 
copy of the gene) or a simple dominant-negative mechanism. In 
the first case, loss of one allele and in the second case, loss of both 
alleles, would be expected to generate a model of HD, 

Transgenic models of HD 

A dominant gain of function mechanism would predict that a 
mouse model of a polygm neurodegenerative disorder could be 
generated by the introduction into the mouse germ line of a 
mutant copy of the gene in question, irrespective of the presence 
of two copies of the endogenous mouse homoipgue. The first 
description of HD transgenic mice used a full length cDNA 
construct under the control of a CMV promoter carrying (CAG>44 
(30), Of the HD transgenes, 2/6 founders expressed high levels of 
transgene mRNA but a transgene protein was not detected. Whilst 
these results could be interpreted as providing evidence that 
translation of the CA<3 repeal into a poJygln expansion is 
necessary far pathogenesis, the repeal expansion in this 
experiment is comparatively modest and it is possible that larger 
expansions are necessary to generate a phenotype with an age of 
onset that tails within the lifetime of a mouse. 

Genomic clones are frequently more successful at generating 
transgenic models than cDNAs as they often direct an expression 
profile that mimics the endogenous gene. The large size of the HD 
gene (170 kb) necessitates that genomic constructs are prepared 
and manipulated in the form of yeast artificial chromosomes 
(YACs). Using YAC technology, Hodgson et al (31) have 
successfully generated mice that arc transgenic for the normal 
human HD gene. They have crossed the human HD transgene 
onto an Hdh nullizygous background and shown that the human 
YAC can rescue the embryonic lethal phenotype. This indicates 
that the transgene is expressed appropriately and predicts thai the 
introduction of a mutant version of the human YAC would be 
successful in generating a model of HD. 

Mice transgenic for a mutant version of exon 1 of the 
HUgene 

We have described four lines of mice that arc transgenic for exon 
1 Of the HD gene carrying CAG expansions of 115-156 (RoVl, 
R6V2, R675 and R6V0) and a further two lines transgenic for the 
same Construct carrying 18 repeats (HDex6 and HDex27) (32). 
The transgene is ubiquitously expressed at both the RNA and 



protem levels in all lines except R6/0, in which no evidence of 
expression has been detected (32,33). Hie transgene protein 
contains the first 69 amino acids of huntingtin in addition to the 
number of residues encoded by the CAG repeat (Lc. -3% of 
huntingtin). 

A progressive neurological phenotype has been observed in 
three lines: R671, (CAG)n 5 ; R672, (CAG)i 45 ; and R675, 
(CAG)i30_ 1 ^. Line R6V2 has an onset of -2 months, line R671 at 
-5 months and R6/5 hemizygotes do not show symptoms after > 1 
year. Lines R6V1 and R6/5 show an earlier age of onset and more 
rapid progression of the disease when bred to homozygosity. The 
phenotype includes an irregular gait, resting tremor, stereotypic 
and abrupt, irregularly timed movements and epileptic seizures. 
Coincident with the onset of the motor disorder there is a 
progressive reduction in body weight in the transgenes as 
compared with their litiermate controls. The absence of a 
phenotype in lines KfVO, HDex6 and HDex27 suggests that 
expression of the poly gin expansion forms the molecular basis of 
the phenotype rather than me expression of a novel peptide. It is 
notable that the R6 phenotype does not include an overt cerebellar 
ataxia as described for the Spinocerebellar ataxia lines (34,35) 
(see below). Extensive neurormtbological analysis has been 
p erform ed on the brains of R6V2 mice. At 12 weeks, the only 
difference that could be identified between the transgene and 
control brains was that the R672 brains were -r 209& smaller, and 
that this reduction in brain size occurred across all structures with 
an apparently normal neuronal density. This is consistent with 
early changes that occur in the brains of HD patients. More 
recently, irmnunocytochemistry with hurrrmgtin N-terrninal 
antibodies has identified the presence of neuronal intranuclear 
inclusions (Nil) in the brains of symptomatic transgenic mice (33). 

COMPARISON WrTH OTHER CAG/POLYGLN 
MOUSE MODELS 

Mice transgenic for both S CA1 (34) and MJD (35) constructs have 
also been reported to develop a phenotype. A summary of the main 
features of these and tbcR6 transgenes is presented in Table 1 . The 
SCA1 transgenes were the first d^onstration that modelling a 
polygbi repeat disorder would be possible in the mouse. They 
included mice transgenic for the SCA1 cDNA carrying either a 
normal interrupted allele of (CAG) 12 CATCAGCAr(CAG)i5 
(FS-3Q) or an expanded uninterrupted allele of (CAG)s2 (PS-82) 
under the control of the pcp2 promoter (Purkinje celf-speciftc) 
(34). Five of six PS- 82 lines showed RNA expression between 10- 
and 100-fbld of endogenous levels, In the original report, transgene 
protein could not be detected but this has since been shown to be 
present by mimxnXK^ytCChemistry (ROtr and HZoghbi, personal 
communication). Mice from all five lines developed ataxia. Onset 
varied from 12 so 26 weeks and a dosage effect was apparent: in 
two lines studied, homozygotes were more severely affected than 
hemizygotes. Neuroparhological analysis showed significant loss 
of the Purkinje cell population, with Bergmann glial proHfcratioa, 
and shrinkage and gliosis of the molecular layer. Ectopic Purkinje 
Cells were present in the molecular layer and occasionally me 
granuhar layer and the dendridc arrays also appeared to be 
abnormal. § 

Ikeda etal. (35) used expression constructs containing the MJD 
cDNA carrying 79 CAG repeats (MJD79), the CAG repeat 
followed by only the C-terminus of the MJD gene with both 79 
and 35 CAG repeats (Q79C and Q^sQ and a 79 CAG repeat in 
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isolation (Q79) under the control of a Purkmje cell-specific 
promoter. Ataxia was observed in 3/3 Q79C and 2/6 Q79 
transgenic mice, occurring as early as 1 month of age after full 
activation of the promoter. In contrast, a pbenocype was not 
Observed in any of (he ten QjsC or four MJ0D79 transgenic mice 
as of 7 and 5 months, respectively, NeuropatholOgical analysis Of 
a 2 month old ataxic Q79C mouse showed a very atrophic 
cerebellum, in which all three layers were affected. The authors 
suggest that comparison of their data with that of Bumght et ai 
(34) indicates that the polygln tracts are more toxic when present 
in isolation or in the context of a truncated protein. In the absence 
of any data relating to expression levels it is difficult; to come to 
Strong conclusions with regard to the comparative toxicity of the 
full length and truncated constructs. However, these conclusions 
were strongly supported by a series of transient rransfecitons of 
COS cells described in the same paper (35). 

MOUSE MODELS OF CAG/CTG REPEAT STABILITY 

Repeat stability studies carried out on the first mice transgenic for 
CAG repeat expansions showed no evidence of instability and 
suggested that (he molecular mechanism underlying triplet repeat 
instability in humans may not exist in the mouse. These initial 
studies included (CAG)45 in the androgen receptor cDNA (36), 
(CAO44 in the HD cDNA (30), (CAG)g2 "m che SCA1 cPNA (34) 
and (CAjG)79 in constructs based on the MJD/SCA3 cDNA (35). 

Triplet repeat instability in lines transgenic for the HD 
mutation (KG) 

The R6 lines transgenic for exoa 1 of the HD gene carrying 
(CAG)n5-(CAG)is5 expansions showed both intergenerational 
and somatic repeat instability (32,37), The repeats were clearly 
unstable on transmission in lines R671, R6Y2 and R6Y5, although 
mis was less clear in line R6V0 as the changes observed in this line 
could be accounted for by errors in sizing. In line the degree 
of instability increases with the age of the transmitting male (as 
R6/2 females are sterile it was not possible do look for an age effect 
on female transmission). R6V5 was the only line in which an 
extensive comparison of instability on both male and female 
transmission was conducted and the repeats had a tendency to 
increase on male transmission and decrease on female transmission 



(37). This trend was supported by the mtergeneratjonal instability 
observed in the other lines. The CAG expansions introduced into 
these mice are considerably larger than arc normally seen in HD 
patients. The change in size of the repeat on transmission in the 
mice is smaller than would be expected from comparison with size 
changes associated with highly expanded CAG repeats seen in 
humans. The discrepancy in the degree of instability between 
humans and mice may reflect the difference hi their life span, a 
model supported by the observation that thesize of the imergenera- 
cional expansion increased with the age of the transmitting male. 

Somatic instability was detected in lines R6/1, R672 and R675 but 
not in line R670 (Fig. 1 ). In all three lines, onset of "instability was 
at ~6 weeks and the CAG repeat range increased with the age of 
the mouse. This argues against a pathogenic role for repeat 
instability as the age of onset of symptoms in these lines differs 
markedly. The pattern of instability was more widespread in some 
lines than others although on the whole it was first present and most 
prominent in brain regions. Peripheral tissues that consistently 
showed instability included liver and kidney. Overall the somatic 
instability was comparable with that described in individuals 
carrying CAG expansions (2 1-26). The major difference between 
line R6/0, in which instability was not apparent, and the other lines 
was the absence of transgene expression. This is probably due to 
gene silencing by a position effect as the R670 transgene has clearly 
integrated into a region of unusual genomic structure (32). 

Other mouse models of triplet repeat instability 

Triplet repeat instability has also been reported in two scries of 
lines transgenic for the myotonic dystrophy (DM) mutation (CTG 
on the sense strand) (38,39). The mtegration fragments used in 
these lines were a genomic fragment (Dm/ 162) from the myotonic 
dystrophy (DM) locus containing a small portion of the coding 
region and the 3 TOR with (CTG)i62 (39) and a cosmid (DM55-5) 
containing the myotonic dystrophy protein kinase gene (DMPK) 
with (CTG)55 and the flanking DMR-N9 andDMAHF genes (3$), 
IntergeiK rational instability was observed in both of these cases. 
The DM5 5-5 transgenes showed mtergenerarional instability in 
6,8% of transmissions, the changes generally being expansions of 
one repeat unit A higher frequency of unstable transmissions was 
Observed in the Dmt lines (as in the R6 lines), most likely as a 
consequence of the larger $tw of (he repeat tracts. 



Table U Summary of CAO/polyghitamme transgenic mouse lines in which a progressive neurological phenotype has been observed 



Disease 


ConairuCL 


Prompter 


CCAG)a 


Express 1 or 
RNA 


I 

Protein 


Phenotype 


Prcq. of lines 
showing phenotype 


HD 


exoa 1 (genomic) 


HD 


16 


+ 


+ 


none 


0/2 


HD 


ex cm 1 (genomic) 


HD 


142 






none 


0/1 


HD 


exon 1 (genomic) 


HD 


115-156 


+ 




+ 




SCA1 


cDNA (full length) 


pcp2» 




+ 






on 


SCA1 


CDNA (full length) 


Pcp2 fl 


82 


+ 


+ 


+ 


5/6 


MJD 


cDNA (full length) 


L7 tt 


79 


NR 


NR 


none 


0/4 


MID 


cDNA (C-ierrainus) 




35 


NR 


NR 




QUO 


MJD 


cDNA (C-tennliius) 


L7- 


79 


NR 


NR 


+ 


3/3 


MJD 


potygJntaminc tract 


L7« 


79 


NR 


NR 


t- 


2/6 



NR, no? reported. 

"Purklnje cell-specific promoter: 

b Interrupted repeat: (CAG)i 2 CATCAGCAT(CAG) l5 . 



06/1 0 '06 14:29 FAX 43 1 5129805 PAT. ATT, VIENNA @014 



1636 Human Molecular Genetics, J997, VqL 6, No. 10 Review 



KG/* 




+ 




II 

It 



1^ 




II 



I 





1 





Figure 1. IlhistradDn oFtbcCAG repeal Bomatic instability .seen in thcR6 lines. The repeats were amplified by PCR using a fluorescent primer and sized pn an ABT 
sequencer a«ang [he Genescan and Qenotyper software packages (37). Tn each case die genescan (race arising frora a range of (Issues atthe age ai which (he mouse 
wa* culled Is compared with die trace obtained from tail DNA taken at 3 weeks (top tow). The size of the major peaks la the tali traces ate; 1 15; RtiO, 145; R670, 
142; fc.675, range from 123 to 156. The Rjfi/5 line contains lour copies of the CAG repeat and the difference in the fciil trace between the two R6V5 mke has arisen finrn 
germ line instability. It is clear mm even after 38 weeks there is no evidence for somatic instability in line RnVO. 



Trie Dmi lines [(CTG)i43_i62]. like the R6 limes 
[(C AG)j i5-i553> showed a tendency to repeat expansion on male, 
and contraction on female, transmission. It would appear that the 
instability seen in the Dmi lines parallels that seen in the R6 lined 
and represents more closely instability seen in some of the 
C AG/poly gin neurodegenerative disorders rather than mat seen 
in DM Myotonic dystrophy is caused by a CTG expansion which 
expands to between (CTGhrjo and (CTG)4qqo in me adult and 
congenital forms of the disease with a maternal bias to the 
anticipation (40). Somatic instability was described in one of the 
DM55-5 tiansgencs which had additional repeat bands in brain, 
liver, kidney and eye, A similar pattern of instability was also seen 
moneofthe progeny ofthls mouse, with most instability apparent 
in sperm (38). 



Comparison of the R6, Dmt and DM55-5 lines with the 
transgenic lines that do not exhibit CAG/CTG repeat instability 
does not lead to an undeirstanding of the molecular basis of 
instability. The absence of instability observed in the first four 
reports couW not simply be due to a size threshold effect. It is not 
clear whether the size threshold in the mouse is larger than that 
seen in humans; however, it must be below 55 repeats as a 
moderate amount of instability was seen m Che DM55-5 
transgenes. Similarly, the absence of instability in the first four 
lines; cannot be due to dHfenmccs jn mzw-acting factors which 
are likely to be invariant. If exacting sequences are important, 
the analysis of four scries of lines (30,34—36) which do not show 
instability and three series (37-39) which fairly consistently do 
would suggest that these sequences are likely to be present on the 
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transgenes themselves rather than at the integration Sites. The 
absence of instability in the R6/0 line as compared with the other 
R6 lines could have mechanistic implications. The R670 line only 
differs from the other three at the site of integration which is 
probably acting to silence the expression of the tranagenc. This 
argues against a model in which contractions and expansions of 
the repeat occur through a mechanism linked to DNA replication. 
It raises the possibility that the instability is linked to expression 
which may be a consequence of the open configuration of 
chromatin leading to DNA damage rather than being directly 
linked to transcription. 

CONCLUSION 

It is clear that a polyghx expansion can give rise to a progressive 
neurological pheootype in the mouse. The analysis of existing and 
further transgenic models of CAG/porygln repeat disease will be 
informative with respect to uncovering the molecular basis of 
these disorders. Comparison of transgenes arising from mil 
length and truncated constructs may resolve the speculation that 
the toxic agent is a truncated version of the proteins in question. 
The models will be useful in allowing the study of the early 
disease stages for which paiient marerial is rarely available. 
Comparison of future models in which the transgenes arc under 
the control of endogenous or ubiquitous promoters may shed light 
on the factors which determine the duTering patterns of 
neurodegeneraiion. 
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Summary 

Huntington's disease (HD) is one of an increasing num- 
ber of neurodegenerative disorders caused by a CAG/ 
polyglutamine repeat expansion. Mice have been gen- 
erated that are transgenic for the 5' end of the human 
HD gene carrying (CAG) 115 -(CAG) 150 repeat expansions. 
In three lines, the transgene is ubiquitously expressed 
at both mRNA and protein level. Transgenic mice ex- 
hibit a progressive neurological phenotype that exhib- 
its many of the features of HD, including choreiform- 
like movements, involuntary stereotypic movements, 
tremor, and epileptic seizures, as well as nonmove- 
ment disorder components. This transgenic model will 
greatly assist in an eventual understanding of the mo- 
lecular pathology of HD and may open the way to the 
testing of intervention strategies. 

Introduction 

Huntington's disease (HD) is an autosomal dominant 
progressive neurodegenerative disorder (Harper, 1991). 



The onset of symptoms is generally in midlife although 
it can range from early childhood to >70 years. Anticipa- 
tion is observed, predominantly when the disease is 
inherited through the male line, with the result that 70% 
of juvenile cases inherit the disease from their father. 
The symptoms have an emotional, motor, and cognitive 
component. A detailed description of all aspects of HD 
can be found in Harper (1 991 ). Chorea is a characteristic 
feature of the motor disorder and is defined as excessive 
spontaneous movement, irregularly timed, randomly 
distributed, and abrupt. It can vary from being barely 
perceptible to extremely severe. It involves all parts of 
the body, can have repetitive and stereotypic elements, 
and may have a pseudopurposive appearance (Harper, 
1991). Other frequently observed motor abnormalities 
include dystonia (sustained muscle contraction), rigid- 
ity, bradykinesia (abnormally slow movements), oculo- 
motor dysfunction, and tremor. Cerebellar dysfunction, 
upper motorneuron abnormalities, epilepsy, and myo- 
clonus (brief shock-like muscle jerks) are rare except 
in the juvenile form of the disease, which commonly 
presents with a "Parkinsonlike rigidity." Voluntary move- 
ment disorders include fine motor incoordination, dys- 
arthria (impairment of articulation), and dysphagia (diffi- 
culty in swallowing). The emotional disorder is commonly 
depression and irritability, and the cognitive component 
comprises a subcortical dementia. The biochemical ba- 
sis of this disease is not understood, and there is no 
effective therapy. 

The HD mutation results in the expansion of a polyglu- 
tamine (polygln) tract in a large 350 kDa protein of un- 
known function (Huntington's Disease Collaborative Re- 
search Group, 1993). The normal and expanded HD 
allele sizes have been defined as CAG^ and CAG35_ 121 
repeats, respectively. An inverse correlation between 
age of onset and repeat length is most pronounced for 
juvenile HD for which the longest repeats have been 
observed (Huntington's Disease Collaborative Research 
Group, 1 993; Telenius et al., 1 993). Despite the selective 
cell death, the HD transcript is ubiquitously expressed 
(Strong et al., 1993). The polyglutamines are success- 
fully translated and the huntingtin protein (htt) products 
arising from expanded alleles have been identified in 
protein extracts from HD patients (Jou and Myers, 1 995; 
Trottier et al., 1995a). 

CAG/gln expansion has been found to be the caus- 
ative mutation in five neurodegenerative diseases for 
which the gene has been cloned. In addition to HD, these 
include spinal and bulbar muscular atrophy (SBMA) (La 
Spada et al., 1 991 ), spinocerebellar ataxia type 1 (SCA1 ) 
(Orr et a!., 1993), dentatorubral-pallidoluysian atrophy 
(DRPLA) (Koide et al., 1994), and Machado Joseph dis- 
ease (MJD or SCA3) (Kawaguchi et al., 1994). Many 
aspects of the genetics and molecular biology are com- 
mon to these diseases. They are autosomal dominant 
(with the exception of X-linked SBMA) and show varying 
degrees of anticipation on paternal transmission. The 
size of the normal and expanded CAG repeat ranges 
are comparable, and available data indicate that age of 
onset correlations and patterns of repeat stability are 
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reproduced. A similar ubiquitous expression pattern is 
also characteristic, and the presence of the expanded 
forms of ataxin-1 (SCA1 protein) and atrophin-1 (DRPLA 
protein) in lysates from patient tissues have been ob- 
served (Servadio et al., 1995; Yazawa et ai., 1995). 

Despite the otherwise apparent universality of this 
mutation, the patterns of cell death differ between these 
diseases. In HD, the most striking atrophy occurs in the 
caudate nucleus, which is often reduced to a rim of 
tissue. The putamen and globus pallidus also undergo 
atrophy, and there are subtle changes in the cerebral 
cortex (Vonsattel et al., 1985). SBMA is a form of motor 
neuron disease with both spinal and bulbar motor neu- 
ron involvement (Kennedy et al., 1968). The SCA1 and 
SCA3 spinocerebellar ataxias are clearly distinguished 
by major neuropathological features: Purkinje cell, pon- 
tine nuclei, and inferior olivary nuclei degeneration in 
SCA1 (Zoghbi et al., 1993) and pontine nuclei and the 
molecular layer of the cerebellum in SCA3 (Durr et al., 
1 996). In DRPLA, neuropathology includes the cerebellar 
dentate nucleus, globus pallidus, red and subthalamic 
nuclei, Purkinje cells, brain stem tegmentum, and the 
lateral corticospinal tract (Takahashi et al., 1 988). The 
proteins containing the polygln repeats are otherwise 
unrelated. In SBMA, the repeat lies within the androgen 
receptor (La Spada et al., 1991), while the others are in 
novel genes of unknown function. Subcellular localiza- 
tion suggests differing roles for these proteins (DiFiglia 
et al., 1995; Servadio et al., 1995; Trottier et al., 1995a; 
Yazawa et al., 1995). 

It is essential that transgenic models of these diseases 
are developed. There have been two previous reports 
of a neurological phenotype observed in mice trans- 
genic for a protein carrying a polygln repeat expansion. 
The first used aSCA1 cDNAconstruct with (CAG) 82 under 
the control of a Purkinje cell specific promoter (Burright 
et al., 1995). Three heterozygous lines overexpressing 
the SCA1 transcript by 1 0- to 1 00-foid and two homozy- 
gous lines showed a progressive ataxic phenotype be- 
tween 1 2 and 26 weeks of age. The mice became clearly 
ataxic when walking and routinely fell when attempting 
to stand on their hind legs. Pathologic examination 
showed significant loss of the Purkinje cell population 
with Bergmann glial proliferation and shrinkage and glio- 
sis of the molecular layer. More recently, transgenic 
mice have been reported with a (CAG) 79 version of the 
SCA3 gene and also the (CAG^ polygln tract in isolation, 
both under the control of the Purkinje cell specific pro- 
moter (Ikeda et al., 1996). Affected mice transgenic for 
the isolated polygln tract were severly ataxic, they ex- 
hibit a wide-based hind limb stance, frequently fall when 
moving, and are unable to rear. Overt Purkinje cell death 
was observed with secondary effects to the molecular 
and granular cell layers. No phenotype was observed 
in the mice transgenic for the entire mutated SCA3 gene. 
The authors suggested that the polygln tracts are more 
toxic in isolation than in the context of a protein, al- 
though in the absence of any information concerning 
transgene copy number, genomic structure of the inte- 
gration sites, or expression levels, this interpretation 
should be treated with caution. These reports have 
shown that Purkinje cell specific overexpression of an 
expanded polygln tract, both in the context of the SCA1 



gene or in isolation, is toxic to Purkinje cells and causes 
a corresponding ataxic phenotype. 

In our intitial attempt to generate a murine model of 
HD, we have focused on the construction of a mutant 
yeast artificial chromosome (YAC) for introduction by 
pronuclear injection. Progress was severely hampered 
by both instability of YAC intermediates and the severe 
instability of highly expanded CAG repeats in yeast. 
Consequently, to address the question of CAG repeat 
stability in the mouse, transgenic lines were established 
with a 1 .9 kb human genomic fragment containing pro- 
moter sequences and exon 1 carrying expansions of 
approximately (CAG) 130 . Unexpectedly, this fragment has 
been sufficient to generate a progressive neurological 
phenotype that displays many of the characteristics of 
HD. This is the first time that a model of one of these 
diseases has been generated by a transgene driven from 
an endogenous promotor. The availability of a mouse 
model of the disease is extremely informative with re- 
gard to the size of the polygln expansion and level of 
expression required to produce a phenotype with a 
given age of onset in the mouse. This work suggests 
that the polygln-containing domain of the htt protein 
may be sufficient to generate a mouse model of HD. 

Results and Discussion 

Fragment Used for Transgenesis 
The microinjection fragment was a 1 .9 kb Sacl-EcoRI 
fragment from the 5' end of the human HD gene isolated 
from a phage genomic clone derived from an HD patient 
(Figure 1a). It is composed of ~1 kb of 5' UTR se- 
quences, exon 1 carrying expanded CAG repeats of 
~1 30 units and the first 262 bp of intron 1 . As the CAG 
repeats are unstable when propagated in E. coli, the 
DNA preparation used for microinjection contained a 
heterogeneous set of repeats of varying size but of the 
order of 130 units. In the event that an unspliced mRNA 
should be transcribed from this fragment, an "in-frame" 
stop codon immediately at the beginning of intron 1 
would result in a truncated protein corresponding to the 
first 90 amino acids of the published htt protein (repeat 
size of (CAG) 2 i). 

Genomic Organization of the Integration Events 
Transgenic mice were generated by microinjection of 
single cell CBAxC57BL/6 embryos. Of 29 newborn mice, 
seven died neonatally, and of the remaining 22 pups, 
one male was transgenic. This founder (R6) was initially 
backcrossed to both C57BL/6 and to CBAxC57BL/6 fe- 
males. However, a subsequent need to optimize litter 
size has resulted in the maintenance of the transgene 
on the CBAxC57BL76 hybrid background. F1 mice were 
genotyped both by Southern analysis and by PGR to 
determine the CAG repeat size. Figure 1b shows a 
Southern blot of BamHI digested DNA from a number 
of F1 progeny. It was possible to deduce that the micro- 
injection fragment had integrated into five different re- 
gions of the founders' genome. The predicted genomic 
organization of the integration events is illustrated in 
Figure 1c. In lines R6/1 and R6/0, the fragment has 
integrated as an intact single copy, and in line R6/T as 



Huntington's Disease Transgenic Mice 
495 



4> 



(CAG)n 



Taql Psil 



4G6SN0.3 31329-33934 "* 0G6PEO.2 

40080-40085 35093-33935 



500 bp 



b) 

0 0 1 0 0 0 0 0 

FBTT2TNNTTTNT55NN NTT25 M 




C) 



a highly truncated fragment. In line R6/0, the fragment 
has most probably inserted adjacent to a repetitive ge- 
nomic structure. When the probe4G6PE0.2 is hybridized 
to Southern blots of transgene genomic DNA digested 
with BamHI, Smal, Pstl, or Ncol, in each case a band is 
detected that has barely migrated into the gel. If the 
same blots are probed with 4G6SN0.3, the 5'UTR probe, 
bands of a more expected size range are seen. Line R6/ 
2 most probably originated as a three copy integration 
event, the flanking fragments having been subject to 
deletions, with the result that this transgene functions 
essentially as a single copy integrant. Finally line R6/5 
is represented by five bands on a BamHI Southern blot. 
It is clear that four fragments have integrated as illus- 
trated in Figure 1c. This includes both a taii-to-tail and 
head-to-head arrangement. However, other hybridiza- 
tion bands could not be explained by a straightforward 



Figure 1 . Microinjection Fragment and Iden- 
tification of the Integration Events 

(a) Restriction map of the human genomic 
fragment used for microinjection. An arrow 
denotes the transcription start site and aster- 
isk indicates the position of an in-frame stop 
codon at the beginning of intron 1 . 4G6SN0.3 
and 4G6PE0.2 are fragments used as hybrid- 
ization probes, and solid triangles indicate 
the location of PCR assays used forgenotyp- 
ing and RNA analysis. 

(b) Southern blot of genomic DNA from the 
R6 founder and a number of F1 progeny. DNA 
was digested with BamHI and probed with 
4G6PE0.2. The genotypes are indicated 
above the lanes (1 , 2, 0, T, or 5). A plus sign 
indicates that the mouse also scored as 
transgenic when typed with the CAG repeat 
PCR assay. BamHI fragment sizes are as fol- 
lows: R6/1 , 20.0 kb; R6/2, 1 .9 and 0.8 kb; R6/ 
5, 6.0, 3.6, 2.5, 2.3, and 1 .9 kb; R6/0, band 
migrates close to slot (S); R6/T, 6.0 kb. The 
R6/T genotype is negative with the CAG re- 
peat PCR assay. 

(c) Genomic organisation of the integration 
sites of the transgenes. R6/0, R6/1 , and R6/ 
T are single copy integrants although R6/T is 
highly deleted. R6/2 probably originated as a 
three copy integrant, the flanking fragments 
having undergone deletions, (asterisk) It has 
not been possible to completely resolve the 
structure of the R6/5 integration event. Three 
of the five BamHI fragments can be ac- 
counted for by the structure as drawn. 



configuration, as in those illustrated, or by simple dele- 
tions. It seems likely, therefore, that a complicated re- 
arrangement must have occurred for which it has not 
been possible to completely unravel the genomic 
structure. 



Size of the CAG Expansion in Each 
of the Transgenic Lines 

Four of the transgenic lines: R6/0, R6/1, R6/2, and R6/ 
5 carry expanded CAG repeats. The size of the expan- 
sion was determined by PCR amplification of the repeat 
using a fluorescently labeled primer and subsequent 
size determination using an ABI sequencer (Figure 2). 
The peak sizes are as follows: R6/1, 116 repeat units; 
R6/0, 1 42 repeat units; R6/2, 1 44 repeat units. Line R6/5 
is more complicated with peaks at 128, 132, 135, 137, 
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and 156 repeat units. These repeat sizes are consider- 
ably larger than those that have generally been reported 
to cause the juvenile form of HD in humans. Both ga- 
metic and somatic repeat instability have been observed 
(manuscript submitted). 

Segregation of the Integration Events 

The specific genotype frequencies found in 321 F1 mice 
derived from the R6 founder are summarized in Table 
1 . The integration events appear to segregate indepen- 
dently but are only seen in certain combinations. The 
founder is therefore a germ line chimera with one set of 
germ cells containing the R6/0 and R6/T transgenes and 



the other containing the R6/1, R6/2 and R6/5 trans- 
genes. 

Phenotype Observed in the R6/2 
Transgenic Line 

The age of onset in line R6/2 has been observed as early 
as four weeks (one mouse) but most frequently occurs 
between nine and eleven weeks. Age at death has gener- 
ally been between 1 0 and 1 3 weeks although the mouse 
with the age of onset at four weeks died at six and a 
half weeks. The mice display a progressive neurological 
phenotype. As far as can be ascertained, the mice re- 
main alert, exploratory and inquisitive, and responsive 
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Table 1 . Frequency of Genotypes Arising in 321 F1 Progeny 



Genotype N 



R6/0 


56 


R6/T 


61 


R6/0 + R6/T 


62 


Negative 3 


(56/71) 


R6/1 


8 


R6/2 


16 


R6/5 


15 


R6/1 + R6/2 


9 


R6/1 + R6/5 


8 


R6/2 + R6/5 


11 


R6/1 + R6/2 + R6/5 


4 


Negative 3 


(15/71) 



a The total number of nontransgenic mice are divided between the 
two genotype clusters in a proportion consistent with the genotype 
frequencies. 



to sensory stimuli. The phenotype is complex. There are 
a number of components to the motor disorder including 
a resting tremor, movements described as resembling 
chorea, stereotypic involuntary movements, and in 
some cases a mild ataxia manifesting as dysmetria. One 
of the first symptoms is a dyskinesia of the limbs when 
held by the tail. This progresses to an alternating clasp- 
ing together and releasing of the feet until the mice clasp 
their feet together immediately after they are picked up, 
(Figure 3a), and can no longer release this posture. The 
mice develop a constant tremor that becomes progres- 
sively worse. The tremor tends to be less noticeable 
when they are quiet or asleep, but worsens under stress 
(for example, the removal of the cage lid) or if they reach 
for food or to climb out of the cage. As the disorder 
progresses, stereotypic involuntary movements are ap- 
parent, which include repetitive stroking of the nose and 
face, and a hind limb kicking/scratching motion. Sudden 
movements that involve the whole body and may resem- 
ble chorea are observed. These are rapid, abrupt, irregu- 
lar, and manifest as a shaking/shudder of the trunk. The 
mice do not develop a wide-based gait, can stand on 
their hind limbs and climb out of the cage without falling. 
They only consistently lose balance when sitting on their 
hind limbs, turning, and reaching round to groom their 
backs, which results in a somersault. The mice exhibit 
severe handling-induced epileptic seizures that can last 
for several minutes. 

At weaning, the R6/2 transgenes are indistinguishable 
from their normal litter mates. Coincident with the onset 
of motor symptoms, their weight plateaus and then pro- 
gressively decreases. In the end stages, mice have been 
observed to weigh as little as 60%-70% of their normal 
sibs. As the disease becomes more severe, they are 
very frequently observed to be eating but do not gain 
weight. It appears that the mice are eating rather than 
just breaking off food. Their food comprises an ex- 
panded chow, which does not crumble easily, and ex- 
cess food crumbs are not observed in the bedding. On 
autopsy, the mice are often emaciated with an overall 
loss of muscle bulk although food is observed in the 
stomach and fecal pellets in the gut. Histological analy- 
sis of muscle samples showed no evidence of a my- 
opathy. 



Characteristic vocalizations have been observed. 
These include a sound similar to that made by a new 
born litter, which resembles teeth chattering from cold, 
but is likely to have a respiratory basis (since it occurs 
before the young mice have teeth). A second sound, a 
type of chirping noise, is more reminiscent of a bird than 
of a mouse. The mice are more likely to make these 
sounds when they are under stress (for example, away 
from the home cage). 

The mice appear to urinate more frequently. The bed- 
ding at the front of the cage becomes excessively wet 
as compared to that in cages housing normal mice. They 
are unlikely to be suffering from spastic bladders as the 
wetting of the bedding is not uniform. Urine tests in 
18 transgenic mice (11 male and 7 female) showed no 
abnormality in glucose or protein levels. Similarly, blood 
tests in two mice showed glucose and protein levels to 
be within the normal range. 

R6/2 females are sterile and, of ten R6/2 males that 
have been placed with females from a time just prior to 
expected sexual maturity, five have mated. Of these, 
one mouse produced one litter, two mice produced two 
litters and two produced four litters. On autopsy, the 
reproductive organs consistently appear vestigial or at- 
rophied. Females often have miniscule ovaries and a 
hair-like uterus. Males have small testes, seminal ducts, 
and coagulation glands. On histology, one male that had 
failed to mate was found to have testicular atrophy with 
an absence of spermatazoa, an atropy of the epididymus 
with aspermia, and no secretion present in the coagula- 
tion gland. 

The mice die suddenly and the cause of death is gen- 
erally unknown although one mouse was observed to 
die during an epileptic seizure. 

Dosage Effect on Age of Onset and Phenotype 
Severity in Complex Genotypes 
Lines R6/1 , R6/2, and R6/5 have been established from 
the founder. In the F1 generation, mice with all possible 
combinations of these transgenes were identified. Each 
aspect of the phenotype, as described for line R6/2, has 
been observed for the genotypes listed in Table 2. In 
the end stages of the disease, the transgenes are always 
considerably smaller than their normal littermates. The 
age of onset varies from <3 weeks (R6/1 +R6/2 + R6/5 
genotype) to ^4 or 5 months (R6/1 line). 

The (R6/1 + R6/2+R6/5) genotype is the most severe. 
Only four such mice were recovered in the F1 generation. 
The overall genotype frequency (Table 1) would have 
predicted more than this, and it is possible that some 
mice with this genotype died neonatally or in utero. All 
aspects of the phenotype are more severe and have a 
more rapid progression. The (R6/1 +R6/2+R6/5) mice 
are considerably smaller than their litter mates at wean- 
ing. For example, one weighed 5.2 g at 23 days of age 
as compared to a mean of 9.2 g for her female sibs. She 
reached a maximum weight of 7.5 g but was only 6.0 g 
at death at 51 days as compared to a mean of 16.3 g 
for her sibs. In contrast, line R6/1 has the latest age of 
onset and the slowest progression. The mice begin to 
exhibit the feet-clasping posture when suspended by 
the tail at ~4-5 months. At between 6 and 7 months, 
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Figure 3. Comparison of R6 Transgenic Mice 
and Littermate Controls 

(a) An R6/2 transgenic mouse demonstrating 
the feet-clasping posture adopted when sus- 
pended by the tail. The normal mouse holds 
its hind limbs outward inorderto steady itself. 

(b) The R6/2 mouse (17.7 g) and normal lit- 
termate (21.3 g) at 12 weeks of age. The 
transgenic mouse is thinner. 

(c) An R6/1+R6/2 (10.1 g) transgenic mouse 
and normal littermate (1 9.6 g) at seven weeks, 
three days. There is a considerable size dif- 
ference. 



b) 




c) 




some show a mild tremor and intermittently exhibit ail 
aspects of the involuntary movement disorder as de- 
scribed for the R6/2 line. Epileptic seizures have also 
been observed. The effect of transgene dosage on the 
size of the mice is illustrated in Figure 3. 

On autopsy, atrophy or gross atrophy of the primary 
and secondary reproductive organs is routinely ob- 
served. Otherwise, hepatic changes in the form of poly- 
ploid hepatic nuclei and a loss of cytoplasmic mass with 



no obvious cell death was the only consistent observa- 
tion resulting from a routine histopathological examina- 
tion (two R6/2 and six R6/1 +R6/5 mice in the end stages 
of the disease and displaying all aspects of the pheno- 
type). Thymic atrophy is sometimes present, more fre- 
quently in the more severely affected lines, but this does 
not correlate with the presence or absence of pheno- 
typic features. In a few mice there is a slight deformation 
to the cranial vault resulting in a boney ridge over the 
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Table 2. Comparison of the Onset and Duration of the Phenotype Associated with the R6 Genotypes 



Age of Onset Age at Last Litter Age at Death 



R6/1 + R6/2 + R6/5 


< 3 weeks 


N/A 


4-7 weeks 


R6/1 + R6/2 


3-4 weeks 


N/A 


6-8 weeks 


R6/2 + R6/5 


6-7 weeks 


N/A 


8-1 2 weeks 


R6/2 


9-11 weeks 


6-9 weeks 3 (5 males) 


1 0-1 3 weeks 


R6/1 + R6/5 


1 2-1 6 weeks 


1 2 weeks 3 (1 male) 


24-36 weeks 


R6/1 


15-21 weeks 


1 4 weeks 6 (1 male) 


32-40 weeks 0 



a Mice bred continuously. 

b Mouse failed to breed when cross set up at 19 weeks. 
c Oldest R6/1 mouse is alive at 40 weeks. 



cerebellum. This has been seen more frequently in the 
lines with the more severe phenotype but has also been 
observed in line R6/1 +R6/5. 

A phenotype has not been observed in the heterozy- 
gous (R6/5)/+ or (R6/0J/+ lines, the oldest mice now 
being months. R6/5 homozygotes are developing 
symptoms at ~9 months, and the R6/5 transgene clearly 
contributes to the onset and progression of the disorder 
when in combination with R6/1 or R6/2 transgenes. 

Expression of the Transgene 

PCR primers specific to exon 1 of the human HD gene 
were used to examine the expression and tissue distri- 
bution of the transgenes. RT-PCR showed the trans- 
gene to be expressed in every tissue examined for lines 
R6/2 (Figure 4a), R6/1 , and R6/5, but was not expressed 
in line R6/0. This ubiquitous pattern of expression for 
three of the lines suggests that the transgene is most 
likely expressed from promoter sequences present on 
the microinjection fragment. The absence of expression 
in line R6/0 is probably due to a position effect as South- 
ern analysis of this line predicts that the R6/0 transgene 
has integrated adjacent to a genomic region of unusual 
structure. Northern analysis revealed transcripts of 2.5 
and 2.3 kb in lines R6/1 and R6/2, respectively (Figure 
4b) and the suggestion of a larger R6/5 transcript. The 
4G6PE0.2 probe is derived from intron 1 of the human 
gene, and the presence of this sequence in the tran- 
scripts indicates that the human exon 1 has not spliced 
to mouse exonic sequences potentially occurring close 
to the integration sites. 

The level of expression of the transgene with respect 
to the endogenous mouse hd gene was assessed in 
total RNA from six tissues for each of the lines R6/1 , 
R6/2, R6/5, and R6/0. The PCR primers had identical 
recognition sequences in exon 1 of both the mouse and 
human genes and amplified mouse and human products 
of 121 and 114 bp, respectively. No expression was 
detected in the R6/0 transgene. While the comparative 
expression level varies between tissues, the average 
expression of the R6/2, R6/1 , and R6/5 transgenes was 
75%, 31%, and 77% of the endogenous level (data not 
shown). The tissue variability made absolute quantita- 
tion difficult, but this analysis nevertheless places the 
level of expression of the transgene within the range of 
the murine gene. 

The monoclonal antibody, 1 C2, binds specifically and 
in a size-dependent manner to pathogenic polygln 



expansions (Trottier et al., 1995b). This antibody was 
used to immunoprobe Western blots of cell lysates de- 
rived from a complete set of tissues from lines R6/1 , 
R6/2, and R6/5. A transgene-specific product was de- 
tected in lines R6/2 and R6/5 in all tissues tested. Figure 

5 shows the Western blots obtained for a subset of 
tissues from lines R6/2 and R6/5. The predicted size of 
the R6/2 protein would be ~23 kDa. The migration of 
the R6/2 and R6/5 products, at a size larger than this with 
respect to the markers, is consistent with the aberrant 
migration observed forthe expanded polygln containing 
htt, ataxin-1 , and atrophin-1 products when compared 
to their normal counterparts. A constant band detected 
in all transgene and control tissues was found to be due 
to cross-reactivity of the antimouse secondary antibody. 
Comparison of the intensity of the the constant band 
between the R6/2 and R6/5 tissues suggests that the 
transgene protein is present at similar levels in these 
lines. A protein product has not been detected in line 
R6/1 despite testing ranges of polyacrylamide concen- 
tration and antibody dilution. It would be extremely un- 
likely that a protein product were not present in this line. 
One possible explanation is that the length of polygln 
tract in the R6/1 protein does not present an epitope to 
the 1 C2 antibody. It is not clear from expression analysis 
why the R6/5 phenotype should be so much milder than 
that observed in lines R6/2 and R6/1. 

Neuropathology 

Nine R6/2 transgenic mice, exhibiting a broad spectrum 
of severe symptoms of 2-3 weeks duration, and nine 
nontransgenic littermateswere used for neuropathologi- 
cal investigation. Brains from the transgenic animals 
were consistently smaller than controls (controls 490 ± 
9.8 mg, transgenes 395 ± 8.0 mg). Serial 40 jxm sections 
in either the coronal (12 mice) or horizontal (6 mice) 
planes were processed for either Nissl staining (Figures 

6 and 7) or the immunocytochemical localization of glial 
fibrillary acidic protein (GFAP) or the mouse macro- 
phage and microglial marker F4/80. The morphology of 
the central nervous system (CNS) in the transgenic mice 
appeared normal with no focal areas of malformation or 
neurodegeneration; however, sections of the brains of 
these animals were consistently smaller than those of 
their litter mates (19% ± 1.6%). This reduction in size 
appeared to be uniform throughout all CNS structures. 
Analysis of thionin-stained sections showed no evi- 
dence of neuronal cell loss, oligodendrocyte loss, reac- 
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Figure 4. mRNA Size and Expression Pattern of the R6 Transgenes 

(a) RT-PCR analysis of the expression of the transgene in the R6/2 
line. RNA for PCR had {+) or had not (-) been treated with RT. The 
expression pattern in RNA extracted from an R6/2 transgenic mouse 
(top panel) and a littermate control (bottom panel) are shown. In 
each case, the first track contains RNA from human fetal brain as 
a positive control. 

(b) Northern analysis of the transgene expression in the R6 lines. 
All lanes contained 20 ^gtotal brain RNA and the blot was hybridized 
with the human 4G6PE0.2 intronic probe. As expected, a signal was 
not detected in the human RNA lane. Products of 2.5 and 2.3 kb 
are present in the R6/1 and R6/2 lanes, respectively, and a larger 
band can be seen in the R6/5 lane. Lower panel: hybridization with 
the mouse GAPDH probe. The reduction in intensity in the human 
RNA track is due to cross species hybridization and not unequal 
loading of RNA. 



tive gliosis, or inflammatory change. These latter two 
observations were corroborated by the GFAP and F4/ 
80 stained sections, where the normal distribution of 
astrocytes and ramified microglia cells was observed in 
the absence of any indication of increased reactivity of 
astrocyte staining or the presence of rounded microglia 
or infiltrating macrophages. 
Cerebral Cortex and Hippocampus 
The cytoarchitectonic structure of the cerebral cortex 
was maintained in the frontal, temporal, occipital and 
parietal lobes, although all regions were noticeably thin- 
ner when measured between the pia and subcortical 
white matter. The large pyramidal cells of the motor 
regions of the frontal cortex were present in normal 
number and morphological appearance. Similarly the 
pyramidal cells of hippocampus, subiculum and para- 



hippocampal gyrus, the stellate cells of layer II of the 
entorhinal cortex, and the granule cells of the dentate 
gyrus were of normal size and distribution. 
Basal Ganglia 

A detailed analysis of the striatum, nucleus accumbens, 
globus pallidus, entopenduncular nucleus, subthalamic 
nucleus, and substantia nigra demonstrated normal 
neuronal density and patterns of morphological diver- 
sity. The striatum is composed of a normal complement 
of medium-sized striatal neurons interspersed with 
fewer large and small neurons, together with satellite 
glia. The white matter of the corpus callosum and the 
fascicles of fibers forming the internal capsule contain 
as many oligodendrocytes as similar sections from con- 
trol mice. The striatum is again consistently smaller in 
the transgenic animals. 
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Figure 5. Expression Profile of the Transgene Protein Products 

Identification of the transgene protein product in the R6/2 and R6/5 lines using a monoclonal antibody (1C2) that specifically detects polygln 
expansions. 

(a) Identification of htt in lysates prepared from lymphoblastoid cell lines from a normal individual and an HD homozygote and fractionated 
on a 6% SDS-PAGE gel. The size of the respective CAG expansions are indicated above the tracts. The position at which the fibrinogen 
marker (330 kDa) migrates is indicated. 

(b) Lysates from an R6/2 transgene (T) and littermate control (N) were fractionated on a 1 0% SDS-PAGE gel. 

(c) Lysates from an R6/5 transgene (T) and a litermate control were fractionated on a 10% gel. 

(d) Lysates from the R6/2 and R6/5 lines fractionated on a 10% SDS-PAGE gel. 

(e) The filter in (d) stripped and reprobed with the secondary antimouse antibody, which detects the constant band seen in (b)-{d). 



Cerebellum and Spinal Cord 

The granule cells, Purkinje cells, and the neurons of the 
molecular layer of the cerebellum show no differences 
from the control mice. Similarly, the large motor neurons 
of the anterior horn of the cervical and lumbar enlarge- 
ments of the spinal cord and the dorsal horns are again 
of normal appearance. 

Examination of all other areas of the CNS revealed no 
gross or microscopic abnormalities. 

Discussion 

Transgenic mice that develop a progressive neurologi- 
cal phenotype have been generated by the introduction 
of a genomic fragment containing exon 1 of the human 
HD gene. Four lines have been established, with CAG 
repeat expansions ranging from M15 to 150 repeat 
units. In the three lines that exhibit a phenotype, R6/1 , 
R6/2, and R6/5, the transgene has a ubiquitous mRNA 
and protein expression pattern. The transgene mRNA 
is most likely transcribed from human promoter ele- 
ments and extends into the flanking mouse sequences. 



The presence of human intron 1 sequences in the mRNA 
rules out the possibility that the human exon splices to 
mouse exonic sequences and therefore predicts that 
the corresponding transgene protein products contain 
69 amino acids in addition to the number of polygln 
residues encoded by the repeat expansion. 

The polygln expansions in the R6 transgenic mice are 
of a size considerably greater than is generally associ- 
ated with the juvenile form of HD. Even so, it is not 
possible to predict the phenotypic expression of such 
a mutation in the mouse, in HD, the major focus of 
neuropathological change is in the striatum (part of the 
basal ganglia) and the cerebral cortex. The motor disor- 
der observed in the R6 lines is strongly suggestive of a 
basal ganglia lesion. The mice exhibit involuntary jerky 
shudders that have been described as resembling cho- 
rea and likened to the choreic movements observed in 
the neurological disease arising from canine distemper 
(Lauder et al., 1 954). As far as we can ascertain, chorea 
has not previously been described in mice (Lyon and 
Searle, 1 990). The neuropathological correlate of chorea 
is accepted as a basal ganglia lesion. The pronounced 
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Figure 6. Nissl Sections Through the Mouse 
Forebrain 

Frontal section through the caudate/putamen 
(cp) at the level of the lateral ventricle (v) of 
a normal littermate control (A) and R6/2 
transgenic mouse (B). The caudate putamen 
is shown in higher power in C (control) and 
D (R6/2 transgene). Scale bars, 500 fim in (A) 
and (B), 80 |xm in (C) and (D). 



progressive resting tremor that occurs in all limbs, trunk, 
and head of affected mice also points to a basal ganglia 
abnormality. The observation of epileptic seizures is 
compatible with juvenile HD; however, while seizures 
have a cerebral focus, they could result from many im- 
balances that are both intracranial or extracranial. 

The R6 mice also sufferfrom a progressive decrease in 
body weight and an overall loss of muscle bulk. Similarly, 
loss of body weight and a generalized lack of muscle 
bulk is a progressive and characteristic symptom of HD, 
despite increased calorific intake (Sanberg et al., 1981). 
The weight loss appears to be independent of the hyper- 
kinesia and its molecular basis is not understood 
(Harper, 1 991 ). In addition, the R6 mice appear to urinate 
more frequently as judged by wetting of the bedding. 
Urinary incontinence has also been noted in HD with 
symptoms including frequency, urgency, nocturia, and 
incontinence (Wheeler et al., 1985). Finally, chorea af- 
fecting face, jaw, and pharyngeal muscles affects both 
speech and swallowing and can also cause grunting and 
clicking sounds that may reflect respiratory movements 



(Harper, 1991). It is possible that the unusual vocaliza- 
tions made by the R6 transgenes arise by a similar mech- 
anism. 

A landmark study of the neuropathology of HD has 
classified the neuropathological changes into five 
grades that progress from grade 0, in which HD brains 
show no gross or microscopic abnormalities consistent 
with HD despite premortem symptomatology and posi- 
tive family history, to grade 4, in which the most extreme 
atrophy is observed (Vonsattel et al., 1985). The brains 
from the R6/2 transgenic mice were found to be on 
average 19% smaller than those of their normal lit- 
termates, a reduction in size that was maintained 
through all CNS structures. This finding is consistent 
with neuropathological changes occurring in HD in 
which it has been noted that a 30% reduction in brain 
weight in HD is associated with 20%-30% areal reduc- 
tions in cerebral cortex, white matter, hippocampus, 
amygdala, and thalamus (de la Monte et al., 1988). This 
atrophy was similarfor all grades of HD, suggesting that 
the shrinkage of these structures occurs early in the 
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Figure 7. Nissl Sections throughout the CNS 
Coronal sections through the cerebellum (A), 
cerebral cortex (B), globus pallidus (C), sub- 
thalamic nucleus (D), entopeduncular nu- 
cleus (E) and ventral hom of the lumbar spinal 
cord (F) of an R6/2 transgenic mouse. Within 
the cerebellum, note the normal density of 
the granule cell layer (gc), the monocellular 
layer of pyramidal cells (pc) and the normal 
structure of the molecular layer (m). 



disease process, is not progressive, and reflects cell 
loss of both neurons as well as fibers. Interestingly, 
gliosis was not readily apparent in these structures, and 
the neuronal density was assessed to be normal (de la 
Monte et al., 1988). In contrast, a 60% reduction in the 
cross-sectional area of the caudate, putamen, and glo- 
bus pallidus increases with the higher grades of HD 
brains, indicating that these structures progressively de- 
generate with prolongued survival. It is this specific pro- 
gressive atrophy, associated with reactive astrocytosis, 
that was not apparent in the R6/2 transgenes and is 
also absent from grade 0 HD brains (Myers et al., 1991 ; 
Vonsattel et al., 1985). The grade 0 brains came from 
patients that had had HD symptomatology for between 
2 and 1 3 years (Vonsattel et al., 1 985; Myers et al., 1 988; 
Hedreen and Folstein et al., 1 995), thereby providing no 
pathological correlate for chorea and other early signs 
(Hedreen and Folstein et al., 1995). It seems likely that 
the brains of the R6/2 transgenes have neuropathology 
consistent with that found in the early stages of HD and 
that the progression of the phenotype in these mice is 
so rapid that there is insufficient time for the progressive 
atrophy to take place. A detailed morphometric analysis 
did uncover a neuronal loss in the caudate of grade 0 
brains (Myers et al., 1991), and the absence of reactive 
astrocytosis was taken as evidence that the neuronal 
ceil loss was not a recent event and may support the 
hypothesis that the HD striatum is compromised from 
early in development (Myers et al., 1991). A detailed 



morphometric analysis of the R6/2 transgene brains is 
merited. The neuropathological analysis of the trans- 
genes was also focused on the additional regions that 
undergo neurodegeneration in the polygln expansion 
diseases as a whole, and no evidence of localized neuro- 
degeneration was identified. 

To date, five neurodegenerative diseases have been 
described that are caused by polygln expansions in 
ubiquitously expressed unrelated proteins. It is most 
probable that in each case the polygln expansion con- 
fers a gain of function to the proteins and that this may 
operate by a common molecular mechanism. It has been 
proposed that the specific selective cell death is di- 
rected by the remainder of the respective proteins. The 
R6 transgene protein products contain polygln tracts 
in a domain consisting of only 69 other amino acids 
amounting to ^3% of the htt protein. Therefore, the 
R6 transgenic mice might be expected to represent a 
generic CAG/gln disease model rather than a specific 
model of HD. However, the R6 mice do not develop a 
pronounced ataxia as described by Burright et al. (1 995) 
and Ikeda et al. (1996). They do not develop a wide- 
based gait or fall while moving, are able to rear, and do 
not lose their righting response when turned onto their 
backs. This would suggest that there is no major cere- 
bellar lesion and that the R6 lines do not display the 
major movement disorder of SCA1 , SCA3, and late onset 
DRPLA. Similarly, they do not show a pronounced mo- 
torneuron disease, although the SBMA symptoms in 
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humans are mild with a very slow progression and it 
would probably be difficult to identify this component 
as part of the complex R6 phenotype. The diagnosis of 
HD and DRPLA was not infrequently confused before 
the advent of mutation analysis afforded an unequivocal 
test. Both disorders present with complex and variable 
symptoms that can include chorea, myoclonus, dysto- 
nia, dysarthria, and seizures. Some features are more 
or less associated with the juvenile or adult forms but 
the boundaries are not absolute. It would therefore be 
difficult to express any strong claims as to the specificity 
of a mouse model with respect to these two diseases. 

The R6 mice are the first transgenic model of a polygin 
expansion disease in which the transgenes are ubiqui- 
tously expressed (as are the mutant human genes). The 
two previous reports of a neurological phenotype ob- 
served in mice transgenic for a protein carrying a polygin 
repeat expansion used a Purkinje cell specific promoter 
to drive either a SCA1 cDNA construct with (CAG) 82 (Bur- 
right et al., 1995) or a (CAG^g polygin tract in isolation 
(Ikeda et al., 1996). Purkinje cell death was identified 
with a corresponding ataxic phenotype. It is possible 
that comparable overexpression of these constructs in 
any other cell would also demonstrate toxicity. The dra- 
matic dosage effect on the phenotype observed with 
the R6 transgenes expressing at less than endogenous 
levels suggest that ubiquitous overexpression of the R6 
transgene could be lethal. 

The apparent absence of specific neurodegeneration 
in the R6 mice supports the possibility that localized 
atrophy may be secondary to a primary imbalance that 
is directly responsible for the clinical symptoms that 
arise in HD. Indeed, replication of the patterns of cell 
death observed in HD by intrastriatal injections of quino- 
linic acid does not cause chorea in rats (Harper, 1991). 
It remains remarkable that the introduction of the ex- 
panded version of the polygln-containing domain of htt 
protein into transgenic mice has succeeded in reproduc- 
ing not only features of the movement disorder, but also 
other aspects of the complex HD phenotype. 

Two further lines of transgenic mice are required to 
determine the extent to which the R6 mice represent a 
model of HD. First, mice transgenic for the entire HD 
gene carrying repeat expansions of a comparable size 
must be generated. The large size of the HD gene neces- 
sitates that the construct be introduced in the form of 
a YAC clone (experiments in progress). An identical phe- 
notype would indicate that the remainder of the htt pro- 
tein is superfluous to the course of the disease, and any 
differences would aid in the dissection of the protein 
into functional domains. Second, mice transgenic for 
the nonexpanded CAG repeat version of the R6 lines 
have not been described in this paper. The original pur- 
pose of the R6 transgenes was to study repeat stability 
and,, consequently, the nonexpanded controls were not 
generated in parallel. However, it is important to charac- 
terize such mice, to rule out the unlikely scenario that 
the phenotype observed is the result of a novel peptide. 
Three founders have now been established that contain 
the Sacl-EcoRI fragment with a (CAG)i 8 tract: Hdex/6, 
Hdex/27, and Hdex/28. F1 mice derived from the Hdex/6 
founder are currently 20 weeks, and the mice show no 
signs of a neurological phenotype or weight loss. These 



mice are twice as old as the R6/2 mice at the onset of 
the phenotype. Quantitative RNA analysis shows the 
Hdex/6 transgene to be expressed at levels comparable 
to that in the R6/2 and R6/5 lines; however, it is not 
possible to use the 1C2 antibody to detect the Hdex/6 
transgene protein as this is specific to polyglutamine 
expansions. The Hdex lines will be bred to homozygosity 
and the mice observed over the course of at least one 
year. 

This work raises the intriguing possibility that exon 1 
of the HD gene carrying highly expanded repeats is 
sufficient to generate a transgenic model of HD. The 
mutation is predicted to operate by conferring a gain of 
function to the mutated protein to which some cells are 
particularly sensitive. The cell-selective toxicity may 
be afforded by differing compartmentalization of the 
polygln-carrying proteins or by the specificity of their 
intermolecular interactions. In order that the small R6 
transgene could initiate a chain of molecular events 
comparable to those involving the entire htt protein, it 
would be necessary to predict that the transgene occu- 
pies the same subcellular localization, it has not been 
possible to make this comparison as our attempts at 
immunohistochemistry with the 1 C2 antibody have been 
consistently unsuccessful, and in addition, the subcellu- 
lar localization of htt remains to some extent controver- 
sial. If the selectivity of the cell death arises through the 
interacting proteins, the polygln-containing domain of 
the htt protein must be sufficient to convey this specific- 
ity. There may be some evidence to suggest that this 
could be the case, arising from the isolation of HAP1 
(huntingtin associated protein 1) (Li etal., 1995). HAP1 
binds to htt containing a polygin of 21 residues, and the 
association is enhanced by increasing lengths of the gin 
repeat. There was no binding to atrophin-1 (the mutant 
protein in DRPLA) also containing 21 glutamines. 

it is impossible to predict the accuracy with which 
transgenic mouse lines will model a corresponding hu- 
man disease. The R6 transgenes display many charac- 
teristics of HD, and had this phenotype arisen in mice 
transgenic forthe entire mutant protein, the model would 
have needed little justification. It is clearly possible that 
the polygln-containing domain may be the only part of 
the htt protein involved in the disease process. The R6 
transgenic mice already provide a valuable resource 
for uncovering the molecular pathology of HD and may 
present a target for the testing of potential therapeutic 
interventions. 



Experimental Procedures 
Genotyping 

DNA was prepared from tail biopsy and Southern blots and hybrid- 
izations were as described (Monaco et al., 1 985). CAG repeats were 
sized by PCR using FAM-labeled primer 31329 (ATG AAGG CCTTC 
GAGTCCCTCAAGTCCTTC) and primer 33934 (GGCGGCTGAG 
GAAGCTGAGGA) in AM buffer (67 mM Tris-HCI [pH 8.8], 16.6 mM 
NH4SO4, 2.0 mM MgCI 2 0.1 7 mg/ml BSA, 1 0 mM 2-mercaptoethanol), 
10% DMSO, 200 ixM dNTPs, 8 ng/|xl primers with 0.5 U/^l Taq 
polymerase (Cetus). Cycling conditions were 90" @ 94°C, 25 x (30" 
@ 94°C, 30" @ 65°C, 90" @ 72°C), 10* @ 72°C. PCR products were 
sized using an ABI sequencer and the Genescan and Genotyper 
software packages. The size of the CAG repeat was 85 bp less than 
the size of the PCR product. 
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RNA Analysis 

Northern blots were prepared by standard methods and hybridized 
as described (Monaco et a!., 1985). RNA was reverse transcribed 
(14 U/|xl MMTV RTase, BRL) in 50 mM KCI, 10 mM Tris-HCI (pH 
9.0), 0.1% Triton X-100, 6.5 mM MgCI 2 , 10 mM DTT, 1 mM dNTPs, 
10 ng/|xl random hexamers with 0.35 U/^l RNasin (Promega) at 
10' @ 23°C and then 40' @ 37°C. Primers for specific transgene 
RNA detection were 33935 (CGGCTGAGGCAGCAGCGGCTGT) and 
35093 (GCAGCAGCAGCAGCAACAGCCGCCACCGCC). PCR was in 
AM buffer, 10% DMSO, 200 |xM dNTPs, 10 ng/|xl primer with 0.5 
U/fJ Taq polymerase (Cetus). Cycling conditions were 90" @ 94°C, 
34 X (30" @ 94°C, 30" @ 68°C, 90" @ 72°C), 10' @ 72°C. 

Protein Analysis 

Frozen tissue was homogenized in 50-100 fxl 50 mM Tris (pH 8.0), 
150 mM NaCI, 1% NP-40, 0.5% Deoxycholate, 0.1% SDS, and 1 
mM 2-mercaptoethanol with 1 mM PMSF, 0.5 mM DTT, 25 mM 
benzamidine and leupeptin, pepstatin and chymostatin each at 200 
ng/ml. Homogenates were sonicated on ice 10-20 s, spun at high 
speed at 4°C, and the supernatant transferred to a fresh tube. Protein 
was quantified by the Bradford assay when in sufficient quantity. 
Approximately 50 jxg of protein was loaded per track onto 6% or 
10% SDS-PAGE gels. Kaleidoscope prestained standards were 
used as size markers (Biorad). Fibrinogen (Sigma) was added as a 
size marker of 330 kDa (Jou and Myers, 1 995). Proteins were trans- 
ferred to PVDF membranes (Biorad) that were blocked at 4°C over- 
night in PBS with 5% nonfat dry milk and 2% fetal calf serum. 
Immunoprobing with antibody 1C2 was at a 1:2000 dilution in PBS 
with 0.5% nonfat dry milk for 1 hr at RT. Washes were in PBS 
containing 1 % NP-40 and 1 % fetal calf serum. Secondary antibody 
probing and detection was by use of the ECL kit (Amersham). 

Histopathology 

Brainsf rom nine R6/2 transgenes and nine nontransgenic littermates 
were analyzed for neuropathological change. A 1 :3 series of sections 
was stained for Nissl substance with thionin, or processed free 
floating for the immunocytochemical localization of the glial marker, 
glial fibrillary acidic protein (GFAP), or the macrophage/microglial 
marker F4/80. Nuclear cells groups within the mouse brain were 
verified by reference to Sidman, Angevine, and Taber-Pierce (Sid- 
man et al., 1971). 
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